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ABSTBACT 

The purposes of this article are to analyze the 
concept of literacy in order to identify measurement problems 
associated vith specifying each of these parameters, and to describe 
literacy assessment procedures nov available for dealing vith 
measurement problems. The principal focus of the paper is on the 
development of models for identifying performance criteria that can 
serve as the goal of instructional programs and of the research and 
development programs that lead to them. The five parameters discussed 
are the classes of literacy behaviors, the level of performance that 
serves as the criterion of literate performance, the kinds of reading 
tasks on vhich the behaviors are tested, the iprcpcrtion of the 
reading tasks that serves as the criterion of literacy on some corpus 
of reading tasks, and certain characteristics of the prople tested^ 
such as the levels of aptitude and perseverance represented vithin 
it. (Author/SB) 
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Reading literacy: its definition 
and assessment"^ 



JOHN R, BORMUTH 
University of Chicago 



THIS ARTICLE HAS 3 PURPOSES : 1 ) to analyze the concept of lit- 
eracy for the purpose of identifying the parameters that must be 
specified in literacy definitions, 2) to identify measurement prob- 
lems associated with specifying each of these parameters, and 
3) to describe literacy assessment procedures currently available 
for dealing with these measurement problems. The principal 
focus of the paper is on the development of models for identify* 
ing performance criteria that can serve as the goal of instruc- 
tional programs and of the research and development programs 
that lead to them. The 5 parameters discussed here are a) the 
classes of literacy behaviors, b) the level of performance that 
serves as the criterion of literate performance, c) the kinds of 
reading tasks on which the behaviors are tested, d) the propor- 
tion of the reading tasks that serves as the criterion of literacy on 
some corpus of reading tasks, and e) certain characteristics of 
the people tested, such as the levels of aptitude and perseverance 
represented within it. 



L' Aptitude d la lecture: definition et evaluation 

CETTE ETUDE A 3 BUTS; 1 ) analyser le concept de Taptitude k la 
lecture afin de constater les parametres qui doivent etre specifics 
dans toute definition de cette aptitude; 2) dis^^AOStiquer les pro- 
blemes de mesurage associ^s a la specification de ces para- 
metres; et 3) decrire les procM6s d*evaluation de Taptitude k la 
lecture que Ton emploie de nos jours afin de r^soudre ces pro- 
blemes de mesurage. Cette etude se concentre principalement 
sur la d^ouverte de modeles qui meneraient a Videntification de 
criteres de performance. Ces criteres pourraient alors servir 
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comme but uldme des programmes d mstruction de meme que 
des programmes de recherches qui aboutiraient a Tetablissement 
de ces programmes d'inctrucdon. Les 5 parametres discutees 
sont: a) les categories de comportement dans Taptitud^ a la lec- 
ture; b) le niveau de performance qui sert de critere a Taccom- 
plissement de la lecture; c) les diverses taches d'apres lesquelles 
le comportement est mis k lepreuve; d) la proportion de taches 
acquises parmi un corpus de taches qui pourrait servir comme 
critere de I'aptitude k la lecture; et e) certaines caracteristiques 
des individus mis a Tepreuve comme, par exemple, leur niveaux 
d'aptitude et de perseverance. 



Capacidad de leer: su definicion y determinacion 

ESTE ARTicuLo TiENE 3 PROPosiios : 1 ) analizar el concepto de 
"capacidad de leer", con el proposito de identificar los parametros 
que deben ser especificados en las definiciones de "capacidad de 
Ieer'\ 2) identificar los problemas de medicion asociados a la 
especliicacidn de cada uno de estos parametros, y 3) describir 
los procedimientos para determinar la "capacidad de leer" actu- 
abrente disponibles para poder tratar estos problemas de medi- 
cion. £1 principal enfoque del articulo es en el desarrollo de 
modelos para identificar los crilerios de desempeno que pueden 
servir de objetivo en los prog^.>ma5 de instruccion y en los pro- 
gramas de investigacion y desarrollo que conducen a ellos. Los 5 
parametros tratados son a) las clascr de comportamientos en la 
"capacidad de leer", b) el nivsl de desempeno que sirve de cri- 
terio en el desempeno de la "capacidad de leer*\ c) los tipos de 
tareas de lectura med^ante los cuaJes se prueban los comporta- 
mientos, d) la proporcion de las tareas de lectura que sirvo de 
criterio de la "capacidad c^e leer" en algunos cuerpos de tareas de 
lectura, y e) cierias carac?'»risticas de la ^ente exam3jiada, tales 
como los niveles de aptitud y perseverancia representados en el. 
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Literacy may be defined broadly as being able to respond 
appropriately to written language; in this sense, it is one of man's 
most valued skills. Man has used writing to record, accumulate, and 
store his knowledge in an easily used form. Because those who were 
literate have been able to overcome the barriers that time and space 
throw in the way of communication, some have been able to master 
and apply technical information and thereby achieve unprecedented 
material prosperity. Some have been able to master and apply social 
and poUtickl knowledge to secure personal and political liberties for 
^^hemselves. And some have been able to enlarge their perspective and 
satisfy their aesthetic desires through literature. 

Literacy is an undeniably great benefit, but only to the lit- 
erate. During the past century, nearly every movement that has sought 
to better man's lot has given a prominent place in its program to 
making him literate. And all of these programs have eventually en- 
countered the same problem. When their proponents descend from 
the rarefied stratosphere of rhetoric and attempt to implement their 
progran.>s, they must ask the complex question: what does it mean to 
be literate? None of the many approaches which have tried to answer 
it has provided more than a narrowly limited answer. This paper will 
once again address the issue, not with the naive aim of being able to 
answer the question with a single deft stroke but rather with the 
humble hop\^ of being able to identify most of the major parameters 
of the answer and of being able to suggest what general form the 
ultimate answer might take and how it might be arrived at. 

Conceptions of literacy 

Let us begin by examining some of the earlier eifForts to 
define and assess literacy. We hear claims that there remain large 
numbers of illiterate people in the United States, a nation that has 
experienced several generations of free and compulsory public educa- 
tion. The late Jomes E, Allen, Jr, former US Commissioner of Edu- 
cation, cited these figures (1969); 

1] One out of every 4 students nation-wide has significant reading 
deficiencies. 

2] In large city school systems up to half of the students read below 
expectation, 

O 3] There are more^ than 3 million iUiterates in our adult population. 

ERIC 
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4] About half of the unemployed youth, ages 16-21, are functionally 
illiterate. 

5] Three-quarters of the juvenile offenders in New York are 2 or 

more years retarded in reading. 
6} In a recent US Armed Forces program called "Project 100,000,'* 

68.2 per cent of the young men fell below grade 7 in reading and 

academic aUlity. 

If these statements indicate that there are large numbers of 
illiterate citizens within the United States, they may also be taken as 
evidence that educational institutions have failed tragically to achieve 
one of our most deeply-rooted aims — that all men should have equal 
opportunities to develop and attain their ambitions. Such reasoning 
seems to be what prompted Commissioner Allen and others to advo- 
cate massive research and development programs aimed at develop- 
ing literacy instruction that could remedy the problem. 

Need for better literacy assessment procedures 

Certainly, if the illiteracy level is so high, such programs 
would seem urgently needed and large expenditures of public moneys 
justified. Unfortunately, however, it is impossible to put much faith 
in these or in any other literacy statistics currently available; for none 
of them is based either on a careful analysis of the concept of literacy 
itself cr on suitable methods of measurement. It may be worthwhile 
to examine some of the more commonly used procedures for assess- 
ing literacy and to briefly describe some data indicating that the lit- 
eracy problem may be far more serious than these procedures would 
lead us to beUeve. 

Functional literacy. The Bureau of the Census attempts to 
assess the literacy of the population by tabulating the number of peo- 
ple 14 years of age or over who have not completed 6 years of school 
This constitutes the criterion for what is called functional literacy. 
In order to accept figures based on this criterion, it is necessary to 
make several dubious assumptions; but just one needs to be exam- 
ined here. 

There is no evidence to support the assumption that 6 years 
of schooling are sufficient to raise all students' abilities to the point 
where they can deal competently with ordinary reading tasks. One 
study (Bormuth, 1969c) found evidence that it is probably false. A 



representative sample of 8 articles was drawn from news publi- 
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cations, a cloze readability test was made over each,' and these tests 
were administered to students in grades 3 through 12. These were 
children from mMdle-class homes in a residential suburb of a large 
Midwestern cit>. Ttie average percentage of students who were able 
to answer at lejst 35 per cent of the cloze questions cn the tests was 
calculated. On the average article, only 33 per cent of the students in 
grade 6 and only 65 per cent of those in grade 12 reached this 
criterion. 

In other studies (Bormuth, 1971) it had been shown Chat 
students who are unable to ariswer at least 35 per cent of the items 
in a cloze readabihty test can gain little or no information from ma- 
terials at that level of difficulty. Consequently, there seems to be little 
basis for claimiiAg that a person completing 6 years of school is lit* 
erate. Even graduation from high school does not appear to be a very 
certain criterion of literacy. The fact is that the number of years a 
person has been in school is a very poor index of his ability to read, 
for within any grade level it is common to observe very wide varia- 
tions in the reading abilities of the students. But even if grade level 
were an accurate index of literacy, the small amount of evidence that 
is now available would indicate that the grade 6 criterion is far too 
low and that the illiteracy rate is probably much higher than the 
Census Bureau would lead us to believe. 

Achievement level. Others have tried to get around this prob- 
lem by using, in some way, a person's achievement grade level instead 
of the years he has spent in ^hool. This is a number found by giving 
a reading test of some sort to students who are in various grades in 
school, say at the 5.2 grade level. Their mean test scores are calcu- 
lated, and thereafter students who get a score equal to the mean of this 
group are assigned to a grade level at which, on the average, students 
are able to answer that number of questions on that test. 

But, again, it is hard to tell what these grade level scores mean. 
Commissioner Allen cited a study reporting that 68.2 per cent of the 
men in an Armed Forces study had grade scores of less than 7.0. It is 
impossible to say what this means. Can students with 7.0 scores read 
newspapers, college textbooks, or even the text in comic books com- 
petently? A grade level score does not provide us with any informa- 
tion on just what kinds of real-world reading tasks a person can per- 

1. The cloze readability procedure will be discussed in some detail later in this 
© r. 
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form competently. Consequently, we learn little about the level of 
illiteracy in the population when grade level scores are used to tell us 
either that some proportion of the population falls below a certain 
grade level or that some proportion is 2 grade level years below their 
current grade in schooL 

Data from a study already cited (Bormuth, 1969c) shed some 
light on the matter. By performing a series of regressions between 
scores on cloze readability tests made from each of several newspaper 
articles and a test that gave grade level scores, it was possible to cal- 
culate that the grade level score of the average person w \o answered 
35 per cent of the items on the cloze readability test was 10.5, indi- 
cating that the average person is literate with respect to half of the 
newspaper articles only after 10,5 years in school. This indicates that 
the study cited by AUen employed a criterion that was far too low and 
also that the illiteracy rate may be very much higher than estimated 
by that study. 

Grade level expectanof. The third major way in which people 
attempt to assess literacy is by using the expectancy concept. Accord- 
ing to this, a person has some level of aptitude for learning reading. 
This is usually measured by a verbal aptitude or verbal intelligence 
test score that is converted to a grade level score. This grade level 
score is said to be the person's reading expectancy, meaning that if 
he were working "up to his capacity,** he would probably get a similar 
grade level score when he is given a reading achievement test. Hence, 
a person whose achievement score is, say, 2 years below his expect- 
ancy score does not seem to be profiting very well from his instruction. 

It is possible to cite several strong statistical reasons why the 
studies that have reported these kinds of data in the past must be 
viewed with suspicion. But these can be put aside for an even stronger 
reason: the expectancy score is based on 2 grade level scores, neither 
of which tells anything about whether a person can perform compe- 
tently on real-world reading tasks. In passing, it may also be worth 
mentioning that if every student were working exactly up to his ca-. 
pacity and if the tests now used to measure capacity and achievement 
were slightly unreliable (as all tests are), then exactly hdf of the 
students would appear to be working below capacity at all times — a 
phenomenon Allen cited as evidence of extensive illiteracy in large 
city schools, but one that may be merely an artifact of the random 
^uriation in test scores. 
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Proportion below grade level. One occasionally reads a report 
that such and such a percentage of the students in some school sys- 
tem fall below grade level and are, therefore, destined for a life of 
illiteracy. If this prediction were true, it would not be because the 
children fell below grade level. The grade level scores represent noth- 
ing more than the mean scores of the students in a given grade level. 
Hence, if the achievement test were well-made and fairly recent, we 
could always say with very good accuracy that half of all children are 
always below grade level at all times. But, because we are dealing only 
with grade level scores, we still have no idea whether all, some, or 
even none of those children whose scores fell below grade level can 
respond competently to real-world reading tasks. 

These remarks should not be interpreted as criticism of Allen 
or of anyone else who has attempted to deal with the literacy problem 
at the policy-making level. He and other men of good-will sense that 
something is amiss in literacy training— that large numbers of people 
probably never reach a level of reading ability sufficient to cope with 
even the common reading tasks confronting them daily, In order to 
rally the support needed to remedy the problem, these men require 
evidence. It is extremely unfortunate that there is as yet no adequate 
evidence to place in their hands. But the fact is that we have not yet 
analyzed exactly what is meant by literacy and then devised appro- 
priate methods for meaninng it. 

Nature of literacy 

The term literate may be used to refer to a number of differ- 
ent kinds of behavior, ranging from the ability to employ basic read- 
ing or writing sKills to the knowledge of some body of literature. The 
term will be used here to refer to the ability to respond competently 
to real-world reading tasks. To define the term further, however, re- 
quires that we give detailed specifications of 5 parameters: 1) the 
behaviors we wish to observe, 2) the criterion levn) of performance 
we expect a literate person to demonstrate on tests of those behaviors, 
3) the kinds of materials on which we test the behaviors, 4) the cri- 
terion proportion of the reading tasks on which the person must 
exhibit a literate level of behavior, and finally 5) certain characteris- 
tics of the person tested, such as his aptitude and practical needs 
^^'^^Is). 

ERIC 



14 READING RESEARCH QUARTERLY • Number 1 , 197 3-1974 IX/1 



Comprehensive and fragmentary programs 

The 2 fundamentally different approaches taken to the 
design of literacy programs can be referred to as fragmentary and 
comprehensive. Accept, for the moment, the proposition that a person 
is literate if he can obtain all the information he needs from the 
materials he needs to read. If we view literacy in this way, we can 
see that there are 2 major determinants of a person's literacy — 1 ) how 
many comprehensirin skills are required by the materials that he 
needs to read and 2) how many of those skills has he mastered? Up 
to the present, virtually all literacy programs have been fragmentary. 
In their conception, planning, and execution, they have attempted 
merely to manipulate only one of these 2 determinants. And these 
programs have either ignored the other determinant or regarded it as 
unchangeable. A comprehensive literacy program takes both determi- 
nants into account. Such a program seems feasible within our cur- 
rent social structure and would probably be more effective and eco- 
nomical than fragmentary programs. Moreover, I can see no way to 
define and assess literacy meaningfully unless we take both determi- 
nants into account. 

In order to make these matters clear, let us examine a brief 
analysis of the literacy system as it operates in our society. The pri- 
mary purpose of literacy is to enable a person to gain information 
from material. This information produces effects on his behavior that 
are considered sufficiently desirable to society to warrant its paying 
for the individual's instruction to master the literacy behaviors. And 
the effects are considered sufficiently desirable to the individual to war- 
rant his spending considerable amounts of time and effort acquiring 
the literacy skills. This creates a demand for materials containing in- 
formation of various sorts; and the publisher's job is to determine 
what kinds of information are needed, to arrange to have that infor- 
mation prepared in written form, to edit this material into a form that 
meets the needs of the consumer, and then to print and distribute it. 
The publisher's reward is great enough that he is willing to use what- 
ever reasonable means are available to edit and tailor the readability 
of the materials so that they require just those literacy skills that the 
consumers are most Ukely to possess. Conversely, society's rewards 
are great enough that it Attempts to instruct its members in whatever 
r^teracy skills the materials may require. 
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A person can be considered literate in this system when he 
can get the information he needs from the materials that he needs to 
read. And this is true regardless of whether we view the matter from 
his, the publisher's, or society's standpoint. Hence, a person may be 
regarded as literate or illiterate only with respect to a particular read- 
ing task; and his status relative to that material may be altered both by 
giving him instruction in hteracy skills and by altering the materials 
so that the literacy skills that they require match the ones he has 
learned. 

It seems fairly clear that the primary things required in or- 
der to mount a comprehensive literacy program are the necessary 
technologies required to assess and manipulate the readability of 
materials and to teach and assess people's mastery of the literacy 
skills. In the system outlined, the motivations of the individual, the 
society, and the publisher would assure that they would adopt the 
techniques. Some coordination would be necessary to insure that the 
materials were tailored to the skills being taught and that the cur- 
riculum of this instruction taught all of the skills that are essential 
for transmitting information. Moreover, this coordination could lead 
to considerable economy in both the instructional and publishing 
activities. The number of skills taught could be limited and tailored 
to fit the personal needs of individuals, effecting a savings in instruc- 
tional costs. Similarly, the materials could be tailored to fit the skills 
of their intended audiences and thereby increase the market and 
rewards for the publisher. 

Moreover, there appears to be no sensible way to define and 
assess literacy if we conceive of literacy in a fragmentary way. As 
this discussion progresses, it will become evident that there are a 
huge number of skills that we could consider literacy skills, probably 
far more than we could afford to learn. However, there is no way to 
select among these skQls unless we take into account the materials in 
which these skills are required and the usefulness of those materials 
to society and to various types of individuals. ^ 

Past programs have been conceived and organized primarily 
as fragmentary programs out of administrative necessity. Because of 
the tradition of local control of schools in the United States, literacy 
programs have had to be carried on at that level, utilizing only local 
resources and affecting only a negligible proportion of the total popu- 
?^on. Such a program could not be designed to teach just certain 
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literacy sicills, for doing so might have made its students iUiterate 
with respect to a large proportion of the materials being published. 
And, since only a small proportion of the population was affected by 
a particular school's curriculum, publishers could not afford to pre- 
pare materials especially for them. At the present time, however, 
there have been many precedents for obtaining adequate funding and 
administrative coordination from foundations and government to de- 
sign programs that would have the administrative breadth to affect 
the literacy of nearly all students in the United States. The work of 
the so-called modern mathematics and science programs provide 
graphic examples of what is possible. Under such circumstances, pub- 
lishers might be willing — perhaps even eager — to cooperate. 

But had adequate funding and administrative coordination 
been available at a much earlier date, it is doubtful that a compre- 
hensive program could have had much, if any, effect. There simply 
was not an adequate scientific base on which to build the necessary 
technology. Reading instruction and readabihty were practiced as 
crafts, whose effectiveness depended heavily on the experience and 
intuitions of the practitioners, rather than as technologies, which 
could be employed to produce predictable results. We could not iden- 
tify in any reasonably acceptable way, for example, what skills were 
involved in literacy, what features of language were involved in those 
skills, or how the language features and their associate literacy skills 
influenced the difficulty of materials. Nor did anyone know quite how 
to go about finding out about such things. While we still remain 
largely ignorant of the nature of hteracy skills, psychologists, lin- 
guists, and psycholinguists have, in the course of the past few dec- 
ades, built a scientific base that seems to be at least adequate for the 
effective study of these matters. 



Kinds of behaviors 

The first component of a literacy definition is a set of state- 
ments describing the kinds of behaviors a person must be able to 
exhibit in order to be classified as literate. Pertinent to this are dis- 
cussions of a) the range of behaviors that must be considered (though 
not necessarily included) whenever a definition of literacy is formu- 
lated; b) the need to liniit the range of behaviors included in a par- 
->"icular definition intended for practical use in research, development, 
gir instru*:tional programs and the more important criteria for select- 
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ing those behaviors to be included; and c) some of the major meas- 
urement problems involved in designing tests for assessing literacy. 

Range of behaviors involved in literacy 

At first glance it would not seem to be a particularly diflRcult 
task to say just what behaviors are implied by the term literacy. To 
say a person is literate seems to claim that he can perform some set 
of reading tasks competently. So all one would have to do to arrive 
at the sought-after literacy behaviors is to analyze those tasks to see 
what behaviors they required. But this first glance is deceptive, for 
this problem is closely associated with another problem containing 
several complexities that have led to heated and emotionally charged 
controversies. These controversies arose out of the question of whether 
the reading act involves just the word recognition behaviors — those 
skills involved in decoding written words into spoken words — or 
whether it also includes such behaviors as comprehending that lan- 
guage, critically evaluating its truth and relevance, appreciating its 
aesthetic qualities, and so on. When this problem is properly analyzed, 
it reduces not to an either/or question but merely to a series of ques- 
tions about priorities which can be rather easily (but not painlessly) 
resolved on the basis of values shared by the protagonists on both 
sides of the argument. 

Controversy. Ahhough this controversy has existed for a very 
long time in the area of reading instruction, it surfaced became 
a full-blown public controversy with the publication cr Why Johnny 
Can't Read by Rudolph Flesch (1955). Flesch noted thai substantial 
numbers of children were unable to perform competently even the 
most rudimentary reading behavior — decoding written words into 
spoken words; and he attributed this fact not only to a lack of pho- 
nics content in the reading curricula used in schools but also to the 
presence of a considerable amount of instruction designed to teach 
students the higher level skills commonly referred to as comprehend 
sion^ critical reading, literary appreciation, and the like. It was not 
his contention that these were unimportant skills to learn. Rather, it 
seemed to be his belief that these higher level skills, were of secondary 
priority in the sense that they could not be learned until the decoding 
skills had been mastered and that their early introduction into the 
^"Ticulum interfered with word recognition instruction by diverting 




]"irgy away iiorn the acquisition of decoding skills. 



18 



READING RESEARCH QUARTERLY • Number 1, 1973-1974 



IX/1 



But a confused controversy has contiinued in other forms 
among psychologists, linguists, and educators, centering, among other 
things, on the issue of whether reading curricula should include 
instruction in the higher level skills. Psychologists and linguists have 
argued that reading can be conceptualized as only those skills uniquely 
involved in decoding written language into spoken language and that 
everything slse in the reading curriculum does not really teach read- 
ing skills at all, but rather something often vaguely lumped together 
and labeled thinking skills. A number of others, mostly reading spe- 
ciaUsts, have taken the position that the reading act could not really 
be broken up in this way. They argue that there is an underlying 
continuity in the reading act, that such a distinction is arbitrary, and 
that omitting instruction in the higher level skills would cripple chil- 
dren's potential for performing useful reading tasks. 

This argument would have long since evaf^orated had the 
protagonists begun by addressing themselves to the same issue. The 
group that wishes to define reading as being coterminous with the 
decoding skills has included largely scientists in linguistics and psy- 
chology. To them identifying reading behaviors primarily involves 
breaking them down into small classes so that they can plan and 
carry out manageable scientific analyses. A scientist simply cannot 
perfoiTO useful theoretical work until he has obtained rigorously de- 
fined classes of phenomena to study; thus for their purposes, thisse 
scholars were absolutely correct to place the decoding skills in a class 
by themselves in order to provide a fairly natural and manageable 
phenomenon that can be analyzed from existing linguistic and psy- 
chological theory. 

On the other side of the argument, one finds mainly the 
speciahsts in reading instruction. TTieir objective is to provide a com- 
plete system of behaviors to permit students to cope effectively with 
the reading tasks encountered in the real world. When the specialists 
analyze these real-world reading tasks, they see that the students must 
learn not just the decoding behaviors but also the higher level skills, 
which their opponents seem to oppose teachings The reading specialist 
then labels all of these skills reading skills but without making it 
clear to others that he uses this label merely as a convenient method 
of referring to anything that is taught during the period labeled 
^^eading in the schedules which appear in curriculum guides. To 
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him, the label refers to instruction in how to turn a book's pages, in 
how to find and read page numbers, in decoding skills, and in any 
other behavior that he thinks a) is functional in coping with real- 
world reading tasks and b) can be more conveniently taught during 
that time period than, say, during the period labeled mathematics. 
Both groups, then, are apparently led into thinking that they are talk- 
ing about the same thing because they both v^sh to use the same 
labeL And since each sid^ has developed its definition through careful 
reasoning, it seems to feel the need to jealously defend its usage 
against anything that appears to be a rival definition. Yet since each 
definition was desired to serve quite different purposes, they are in 
no way rivals. 

Seen in this light, the problem of choosing the behaviors to 
be included in a definition of literacy is not a problem of identifying 
what is truly a reading behavior. Rather, the selection of the beha- 
viors to be included in a given definition depends upon the considera- 
tion of the purpose that definition is to serve. If its purpose is purely 
scientific, then the criteria of ccnceptual and theoretical tractability 
seem appropriate for identifying those behaviors. But if the definition 
is to serve as the statement of the objectives of an instructional pro- 
gram that purports to develop a system of behaviors having utility in 
the real world, then it is appropriate to apply stringent social, politi- 
cal, cultural, and economic criteria in the selection of those behaviors. 

What is important to note at this point is that there is no 
true definition of literacy. Rather each definition must be designed 
for the puiT)ose to which it is to be put, and its correctness may be 
judged only in terms of how well it serves that purpose. Thus, when a 
definition of literacy is being developed, it would seem rational to 
state clearly the purpose of that definition, to derive from this state- 
ment a set of criteria for selecting and excluding behaviors, and then 
to select behaviors using these criteria. It seems likely that had ra- 
tional procedures of this sort been followed in the earlier formulations 
of the concept of literacy, we might have been spared much pointless 
and often destructive controversy. 

Taxonomy of literacy behaviors. Much effort has gone into 
the matter of identifying the behaviors a person must have in order 
to deal with a variety of reading tasks. Collecting these taxonomies 
^ s been largely performed by curriculum specialists in reading but 
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much of the content itself has been contributed by analyses in the 
disciplines of psychology; in manuscript criticism in the study of his- 
tory, linguistics, and Ubrary sciences; and in a number of other areas.' 

First, there are the decoding behaviors, which enable a per- 
son to map letters, letter groups and patterns, and typographical fea- 
tures of print onto oral language units. Normally this includes the 
phonics behaviors, which map the smaller graphological units onto 
language sounds; the word structure behaviors, which map whole 
syUables and affixes as units onto their corresponding sounds; the 
sight recognition behaviors, which map whole words onto their corre- 
sponding sounds; the context recognition behaviors, which utiUze the 
context surrounding a word to map the word onto its sounds; and the 
dictionary behaviors, which enable a person to locate and pronounce 
a word from its entry in the dictionary. 

Second, there are the literal comprehension behaviors, which 
enable a person to learn the infoimation explicitly signalled in a 
reading task. This normally includes the vocabulary meaning beha- 
viors, which enable a person to assign the correct meanings to words 
in their contexts; the sentence comprehension skills, which enable a 
person to combine tiie meanings of words in sentences according to 
patterns conforming to the syntax of the sentences; the anaphora 
comprehension behaviors, which enable a person to identify the recur- 
rences of concepts in a reading task so that the appropriate concepts 
are modified when they reoccur in sentences; and the discourse com- 
prehension behaviors, which enable a person to combine the meanings 
of sentences in a passage according to patterns signalled by the dis- 
course syntax of a reading task. 

The remaining classes of behaviors have generally been less 
weD analyzed than the 2 just named. The third might be described 
as the inference behaviors, which enable a person to derive informa- 
tion not explicitly signalled by the reading task. These behaviors 
might be described impressionistically as those that occur when a 
person "reads between the lines'" or somewhat more formally as being 
logic-like processes in which statements in a text might be substi- 

2. At this point it would be inappropriate to attempt either exhaustive littingt or 
preciie definitiont of these areas of behavior. More extensive listings may be obtained 
from other sources (such as Betts, 1954; Bond and Tinker, 1967; or Harris* 1962). And 
the problem of defining complex cognitive behaviors such as these will receive sepa- 
rate discussion in this article. The brief discussion presented here is provided merely 
^ ''ve the reader a general impression of the range of behaviors that must be consid- 
,Y^" for inclusion when a definition of literacy is being developed. 
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tuted into logical algorithms and true sentences not in the text com- 
puted by using predicate calculus. 

The fourth set of behaviors are generally called the critical 
reading skills, and they conform roughly to the procedures known as 
manuscript criticism in the study of history. They consist of applying 
tests of the consistency of the logic of a text, verifying its factual 
claims, verifying the authority of the writer, and detecting and evalu- 
ating propaganda devices. 

The fifth set are the aesthetic appreciation behaviors. These 
are difficult to characterize because they are typiGally discussed in 
terms that do not readily lend themselves to behavioral analyses, 
including phrases such as detecting the tone and mood of the story, 
seeing the deeper meanings, detecting the pacing or rhythm of the 
prose, and so on. This set of behaviors seems to be largely appropriate 
for just those reading tasks that have aesthetic pretentions. 

The sixth set of behaviors have been traditionally known as 
the reading flexibility skills. They are the behaviors that enable a 
person to speed up or slow down his reading, depending on the nature 
of the task. They also enable a person to focus on just tlie parts of the 
text containing the types of informaion tested by some set of ques- 
tions or described in some set of instructions, and to switch these 
attentional behaviors to conform to a wide variety of such instruc- 
tions. More recently, this set of behaviors have come to be known as 
mathemagenic behaviors (see Rothkopf, 1966). 

The seventh and final category comprises the study skills, 
which include an assortment of behaviors that enable a person to use 
various reference devices to locate information and then to judge its 
relevance to some problem. This category also includes behaviors that 
enable a person to interpret special devices for presenting information, 
such as maps, graphs, outlines, charts, diagrams, and the like. 

Obviously a complete listing of all the behaviors implied by 
these 7 categories would constitute a work of its own. It should be 
noted, also, that other classes of behaviors could be added — the primi- 
tive reading readiness behaviors, such as those studied by Gibson 
(1970), for example. However, these rather brief descriptions should 
be sufficient to enable the reader to get some sense of the full range 
O jhaviors that are included in at least some instructional programs 
\ICed as literacy or reading programs. 
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Limiting a definition 

In the broadest sense of the word, literacy is the ability to 
exhibit all of the behaviors a person needs in order to respond appro- 
priately to all possible reading tasks. However, it is unlikely that a 
definition of literacy that specified all of these behaviors would have 
much utility, a definition of literacy, as that phrase is used here, rep- 
resents a detailed and explicit statement of the goal of a research, 
development, or instructional program; and all such programs must 
contend with limitations on funds, time, adequacy of scientific knowl- 
edge, access to skilled personnel, and so on. And they must state a 
reasonably believable goal in the first place even to be granted the 
use of any resources at all As a result, they invariably face the need 
to limit the scope of their goal statements. 

One convenient and often necessary way to limit the defini- 
tion is by including in it only some of the behaviors normally regarded 
as literacy behaviors. However, this must be done with considerable 
care in order to avoid serious mistakes. If certain scientific consid- 
erations are ignored, for example, the definition may only appear to 
be sufficiently limited to be useful when in fact it may implicitly 
commit the program to an impossibly large task. Or, if the definition 
includes only socially trivial behaviors, the program may fail to win 
either the financial or scientific support essential for its success. 
Hence, the matter of selecting behaviors to include in a definition 
deserves some examination. 

Utility. Selecting and validating educational objectives in- 
volves problems peculiar to reading instruction. The first has already 
been discussed in another context. This is the problem that either 
reading behavior can be vie wed as a phenomenon that can be studied 
usefully to make scientific contributions to basic linguistics, psychol- 
ogy, history, and other areas of study, or it can be regarded as a 
system of behaviors having considerable economic, social, cultural, 
and political value both to the individual who has learned them and to 
the society of which he is a part. While from many points of view 
this coincidence that reading behaviors have value in both respects 
may be a happy one, it also occasions some confusion and controversy. 

For example, one psychologist (.Gibson, 1970) has been con- 
O iucting an interesting series of investigations of how children learn 
JC o recognize printed letters, and she was awarded special recognition 
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by her fellow psychologists for her contributions to the understanding 
of the reading process. This has occasioned a considerable amount of 
wonder among educational psychologists, who regard the work as 
trivial on the grounds that the processes she was analyzing have sel- 
dom been the source of much difficulty in instruction. So if the results 
of all of this research and all other research of the same type were 
to be applied conscientiously to the design of reading instruction, it 
would result in almost no improvement in the rate or degree of chil- 
dren's mastery of reading behaviors. 

The important point to note here is not whether the academic 
or the educational psychologist is con'ect, since in a certain limited 
sense, both are. Two different value systems can be and have been 
applied to this single set of reading or literacy behaviors, with the 
result that the final judgments were quite different depending on 
which value system was applied. And the same is true of most of the 
other literacy behaviors. For example, the historian would undoubt- 
edly place a high value on research that contributed to a better under- 
standing of the so-called critical reading behaviors because of the vital 
role those behaviors play in the development of his theories, or the 
specialist in literature would undoubtedly place a high value on the 
analysis of aesthetic responses to literature; yet in the context of 
instruction these 2 classes of behavior would be assigned considerably 
different values. Again, different values can be applied to the same 
literacy behavior, because each behavior functions differently in dif- 
ferent areas of activity. Hence, one can identify and include a beha- 
^ y. r in a defi nition of literacy on the basis of its utility, but unless 
the purpose of the definition and the criteria used for selecting and 
rejecting behaviors have been made explicit, one cannot do so without 
a considerable risk of creating confusion. 

Tfiis is not to say, however, that scholars from academic 
disciplines have nothing to say about the utility of behaviors for 
instructional purposes. Quite the contrary, they often have an excel- 
lent grasp of how the literacy behaviors with which they are concerned 
function in real-world reading tasks. The historian, for example, 
would likely be quite critical oi a program that omitted instruction in 
the critical reading behaviors. He would point out that such a program 
would produce a population of credulous dolts v/ho could be counted 
to learn and believe almost anything they read but who would be 




tinually subject to the manipulation of demagogues. 
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Finally, when a definition is used to identify the goals of an 
instructional program, not only must whole classes of literacy beha- 
viors be selected on the basis of economic, social, cultural, and politi- 
cal criteria but also the specific behaviors within each class must be 
subjected to criteria of utility. For example, some phonics rules apply 
with very high frequency in commonly encountered words, and so 
they would generally be regarded as having high social utility. Other 
rules apply in only one or 2 words and those words occur rarely in 
Enghsh, so these rules are judged to be of low utility. 

Hierarchical entrainment of behaviors. Since the cognitive 
processes underlying reading behaviors are not directly observable, 
their relationships are not always immediately apparent and the re- 
sults can have serious consequences. One of these consequences is 
that, even though the literacy definition specifies that only one set of 
behaviors will be taught in an instructional program, it may in fact 
prove to be necessary to teach many additional related cognitive beha- 
viors before acceptable performance on the target behaviors can be 
obtained. 

Such behaviors are said to be hierarchically related (Gagne, 
1965). The simplest case of a behavioral hierarchy may be repre- 
sented by the diagram shown as a b. Here the letters a and b rep- 
resent 2 behaviors in which behavior b is the more complex of the 2 
and depends upon behavior a. An example of a hierarchy of this sort 
might be knowing the phoneme corresponding to the letter f, which 
would correspond to behavior a, and being able to assign a correct 
pronunciation to the nonsense syllable FOD. The latter behavior, of 
course, depends upon or entrains behavior a but also involves unique 
components. It follows, then, that behavior a must be mastered be- 
fore b. A somewhat similar relationship can hold between classes of 
behavior. These hierarchies are symbolized with capital letters as 
shown by A B. In this case, every behavior in class B depends upon 
at least one behavior in class A. An example of this kind of hierarchy 
is that the behaviors of assigning the meaning to printed words de- 
pends upon the behaviors in which sounds are assigned to printed 
words. 

If a literacy definition lists a complex class of behaviors, it is 
implicitly listing the simpler behaviors entrained by that complex 
O ;havior. This fact presents a potentially serious problem whf n Uter- 
JC:y definitions are developed for use in instructional programs in 
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reading. These hieroichic relationships remain only partially under- 
stood, and so it is unclear just what may be entrained in complex 
behaviors like the critical reading skills or the aesthetic appreciation 
behaviors. It is possible that, when they are subjected to careful analy- 
sis, they might prove to be quite simple and easily taught. On the 
other hand, it is also possible that they could turn out to be extremely 
complex so that a definition that included these behaviors might 
implicitly commit the program for which it serves as a goal statement 
to a course that is quite beyond the resources allocated to that pro- 
gram. 

Interactions among behavioral classes. There are very good 
reasons to doubt that it is possible to draw sharp distinctions between 
classes of behaviors that are hierarchically related to each other. In 
those processes that have been carefully studied, we seem to &nd 
hierarchic relationships running in both directions. The main evidence 
of this is that there is no set of decoding behaviors that, taken by 
themselves, are sufficient to permit the pronunciation of all the words 
a person is likely to encounter. The phonics skills, for example, have 
often been offered as the word pronunciation method par excellence. 
And, indeed, they probably do represent one of the most useful sets of 
behaviors one can employ to pronounce words. 

It is now clear, however, that the phonics skills cannot be 
employed to pronounce many words unless those skills are coupled 
with certain of the literal comprehension skills. An obvious example 
is the printed word read in the clauses they read it yesterday and they 
read it daily, where one cannot apply the appropriate phonics rule to 
the vowel letters until he has read the rest of the sentence and com- 
prehended it v*ell enough to determine the tense of the verb read. 
The printed word lead in the sentences they lead their dogs and it is 
wade of lead presents a somewhat different situation in which the 
application ot the correct phonics rule to the vowel letters depends 
only on the person'j«, having assigned the word to the appropriate part 
of speech — a procesis that is thought to be an essential component of 
the language comprehension processes (Osgood, 1963). 

Vcnezky ( 1967 ) has investigated this matter in some detail 
and has shown that there is a class of words to which the phonics 
rules cannot be applied directly but only alter the word has been 
y .signed to a part-oi-specch category. The printed words suspect, 
y^(iiy, imprint, and permit, ior examples, are pronounced differendy, 
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depending on whether they are employed as verbs or nouns. Ahhough 
this constitutes a fairly small class of words, Goodman (1969) has 
been able to provide a substantial amount of evidence to show that 
the comprehension behaviors are employed extensively by children to 
aid them in the word recognition processes. 

Ordinarily, the reading comprehension behaviors are analyzed 
as hierarchically entraining the word decoding behaviors, and it was 
pointed out that relationships of this kind must be taken into account 
in selecting behaviors to be included in a literacy definition. The fore- 
going discussion demonstrates that a reverse hierarchy of a sort oper- 
ates to connect the same 2 sets of behaviors. Furthermore, it seems 
likely that these 2-way hierarchies may prevail among a number of 
classes of behaviors. Research by hnguists shows that v/hile language 
at a' higher level of analysis, say the morphological level, is built up 
out of units from a lower level of analysis, the phonological level, 
many of the phenomena at the lower levels cannot be explained except 
in terms of the theory employed at the higher levels. 

Measuring literacy hihaviors 

Deciding what types of behaviors one ought to eripect of a 
literate person presents one type of problem, but deciding how those 
behaviors should be observed snd measured presents problems of a 
completely different order. The former is primarily a matter of social 
policy-making in which one decides what social, political, cultural, and 
economic values are affected by each class of literacy behaviors; 
weights each class of behaviors according to the weight given each 
value affect; and then includes in the definition of literacy as many of 
the most valued behaviors as practical circumstances will justify. 
Measuring and observing those behaviors, on the other hand, is a sci- 
entific and technical problem that involves constructing a theory of 
the processes underlying those behaviors and then identifying test 
tasks that can be performed by all, and only all, persons who have 
actually acquired those behaviors. Consequently, discussion of such 
testing must deal primarily with the logical and scientific issues in- 
volved in testing Uteracy behaviors. 

The argument pursued here has this general form: First, it is 
Q :onomically and logically desirable to use verbal questions as the 




rimary mode of testing literacy behaviors. Second, traditional meth- 
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ods of deriving verbal test questions are primitive ( they do not provide 
us with the explicit rationale that seems essential for tests that oper- 
ationally represent research and development programs, especially for 
those programs which have either serious scientific pretentions or a 
responsibility for accounting for the effectiveness with which their 
funds were used). Finally, techniques have become available for 
developing adequate rationalized literacy tests. ' 

Although it is necessary to restrict our consideration to the 
problems of testing the more complex literacy behaviors, omitting the 
word decoding behaviors and the study skills, the arguments presented 
apply self -evidently and with equal force to the areas eliminated. 
However, the problems involved in testing the more complex literacy 
behaviors are much more complicated and have only recently been 
subjected to analyses that are scientifically adequate, making it more 
important to focus specifically on them. 

Necessity of observing only overt behaviors 

In discussions of literacy assessment, as in most discussions 
of the operations involved in testing cognitive processes, it seems 
necessary to begin with an apologia of 2 rather elementary but very 
important facts about testing. The first is an explanation of the func- 
tion of a test item or task, and the second is an explanation of the 
problems presented by the necessity to observe only overt behaviors. 

Function of the test item. Literacy behaviors, like nearly all 
cognitive behaviors, are not just a set of overt and stereotyped beha- 
viors that a person repeats over and over in nearly identical form, like 
turning a key in a lock or throwing a ball. People simply are not 
expected to read the same passage over and over. And when they read, 
the behavior of major importance is not even observable directly. 
Rather, a person is expected to exhibit literacy behaviors in response 
to passages he has never seen before. Thus, a person is literate only 
when he has learned and can apply a set of mental processes that 
enable him to respond with the appropriate set of behaviors to pas- 
sages that are new to him. 

But a mental process is an event that occurs internally, where 
it is not directly observable or interpretable. It is true that we can 

3. Each component of thii argument repreienti a complex set of iiiues* t n^A the 
O scussioni presented here /Are necessaiily brief. But more detailed treatmei may be 
CD Ipund in Anderson (1973). Bormuth (1970 and 1969a). Bormuth. et ai (id70. Finn 
^i^J^ 973). and Hively ( 1968). 
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observe the electrical effects of mental processes, but we presently 
have no way to interpret those effects, and most of the mental proc- 
esses involved in literacy behaviors are so complex that we are un- 
likely to be able to do so in the immediate future. 

Instead, we are forced to observe only the objects and events 
external to the individual to determine whether he is literate. We may 
observe the materials placed before him and the instructions he is 
given for and the questions he is asked about those materials. Then 
we may observe the responses he makes. So it must be recognized that 
what we are forced to observe in assessing literacy is not the processes 
that we really want to observe but merely objects and overt behaviors 
tliat we take as being signs of the presence or absence of the processes 
that in fact determine whether or not a person is literate. To be 
specific, in order to determine if a person is literate, we must have 
a) a theory about the nature of the mental processes that constitute 
literacy and b) a secondary theory that connects overt behaviors in 
certain situations to the various mental processes that constitute 
literacy. 

The test task or test item is a product of this secondary the- 
ory. It functions as a set of circumstances in which u person is forced 
to exhibit some sort of behavior; the nature of that behavior is inter- 
pretable within the theory as evidence that the person does or does 
not possess the mental process being studied. 

Problems luith observing only overt behaviors. Quite aside 
from the purely scientific problems encountered in developing the 
theories of processes and the secondary theories of testing, there is the 
troublesome problem of whether it is possible to test all of the impor- 
tant literacy processes merely by observing overt responses to tests. 
For example, the critical reading skills might include a set of processes 
that we might label the ability to sense ulterior motives of an author. 
If a very large number of items which test these processes were de- 
vised, it would still be possible for someone to claim that many of the 
processes that he thinks fall under that label will remain untested by 
any of the items in the set and by any other items derived in the same 
way. This type of assertion may be used as the motive for developing 
new types of items. But sometimes it is used with destructive intent 
as the basis for the claim that testing is worthless because all testing 
jj—5t rely on the observation of only overt behaviors and some mental 




::esses can never be observed in a person*s overt behavior. 
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This assertion can be answered at 3 levels. First, at the prag- 
matic level, we can point out that the roles performed by testing are 
not merely peripheral to instruction but are actually essential compo- 
nents of it. From the point of view of the student, test items represent 
the only effective way he has of determining what it is that he is 
supposed to be learning and whether or not he is learning it. The 
instruction may contain many exhortations to him, telling him to 
strive to attain many things; but in the final analysis, the only things 
he has to learn and the only things he can find out If he has learned 
or if he needs to seek further Instruction in them are just those proc- 
esses required by the tests he is given. Also, from the point of view of 
the instructor, the only evidence he has of what he has taught or 
failed to teach is obtained from the tests he uses. Consequently, at the 
pragmatic level, the argument has little force since there remains a 
need to learn those processes that can be tested and tests are an 
indispensable element in that instruction. 

Second, a som^v/hat more general argument can be built on 
the fact that operationalism is a fundamental prerequisite for accu- 
rately communicating scientific knowledge. A verbally expressed con- 
cept is subject to almost as many interpretations as there are people 
to interpret it unless that concept has been defined in terms of pub- 
licly observable events, objects, and operations. Thus, the processes 
underlying literacy behaviors are defined jointly by the form of the 
written language to be read, the form of the questions or test tasks, 
the relationships among the test tasks and the passage, and the condi- 
tions under which the tests are given. 

At the third level, the proposition that mental processes can 
never be measured with overt behavior may be extended in arguments 
claiming that a process cannot be taught to people unless it can be 
tested and thus the untestable process cannot possibly be given atten- 
tion at a research, development, or instructional level. A proposition 
that there is some important and untestable process might indeed be 
interesting, but it requires evidence before it can be fully believed. 
That evidence would probably have to be in the form of a task that 
would evoke an overt behavior that served to index the process in 
question. And finding this evidence amounts to a refutation of the 
original proposition thai the behavior was untestable. Hence, the claim 
Q Bms devoid of any substantive meaning. The principal philosophical 




estion at issue seems to be. Of what consequence can a mental 
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process be if it has not ijet been demonstrated to have any manifesta- 
tions in a person s overt behavior? We cannot e /en make a convincing 
claim that such a behavior exists, without refuting that claim. 

Selection of a testing mode 

There seem to be just 2 major classes of test tasks used to 
measure literacy behaviors. The first is the performance task, which 
requires a person to read some passage and then to demonstrate a 
literacy behavior by performing a task that involves either concrete 
objects and events or pictures of objects and events. One such task 
might require a person to read instructions for assembling a bicycle 
and then have him either actually assemble a bicycle or discriminate 
among pictures depicting correct and incorrect methods of assembling 
it. The second major class of test task is the verbal question, which 
consists of an interrogative sentence requiring a response; both the 
question and the response are derived from the language in the pas- 
sage. The person is required to read the passage and then either to 
write, speak, or select the response from a group of alternative re- 
sponses. This type of item may range from those that ask a person to 
pronounce a word to those that ask him to induce and describe the 
moral principles that govern the behavior of the hero of a story. It 
should be noted that the principal distinction between the verbal ques- 
tion and the performance task is not whether one employs language 
in the test task. Both invariably do, at least in the instructions for the 
task. Rather, the distinction is that a verbal test question involves only 
language in both the question stem and the response. 

Evaluation of performance tasks. The performance task super- 
ficially seems to provide the most valid type of literacy test. Perhaps the 
ultimate criterion of literacy would be obtained by giving a person a 
passage to read and then following him about through his normal life 
routines and observing whether the passage had the appropriate ef- 
fects on his behavior. This, of course, is a preposterous proposal 
because of the enormous expense involved, if for no other reason. 
Consequently, it is necessary to employ some artificial but more con- 
venient testing procedure and then infer that a correct response on 
this artificial task may be taken as valid evidence that the person 
would be found to respond correctly in his normal life routines if we 
were to follow him about. This can be referred to as the pragmatic 




ty of a test task. 
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The performance task attempts to gain its validity by simu- 
lating situations the person might encounter in his normal life rou- 
tines, and it gains considerable practical usefulness because these 
simulations can be performed at the convenience of the tester. Still 
greater economy is obtained by using pictures instead of concrete 
objects and events. However, it should be recognized that this may 
reduce the item's apparent pragmatic validity, which depends on 
the apparent quality of the analogy between the performance task and 
the normal life situation. And the use of pictures may reduce this. 
However, whether or not a type of item is actually pragmatically valid 
depends solely on its experimentally demonsuated ability to predict 
appropriate behaviors in the person's normal life routines. Since there 
have been no studies attempting to demonstrate the pragmatic validity 
of performance items as a class, it must be said that the pragmatic 
validity of any performance item is apparent only, and not demon- 
strated. 

Indeed, it would undoubtedly i e difficult, if not actually 
impossible, to demonstrate ti ^ pragmatic validity of the performance 
item as a class. To do so, we would have to define this class of items 
in a manner that would pennit us to draw samples of items that we 
could be certain were unbiased representations of the total population 
of performance items. Then and only then could we conduct studies 
of their pragmatic validity, studies that would permit us to infer that 
the properties of the samples of items were also properties of the other 
ite^ns in the population. It is hard to determine even where one might 
sturt in an effort to define the population of performance items in such 
a manner that a random sample might be drawn from it. Possibly we 
might begin with a passage and identify aH the situations a group of 
people have encountered in which they could have demonstrated their 
literacy with respect to that passage and then we might select from 
these, those situations that might be suitable for testing purposes, and 
finally we could study the pragmatic validity of the tasks so selected 
with respect to the remaining tasks. But this still leaves us wondering 
what it might mean to identify all the situations a group of people 
encounter and how one might go about simulating these. For the 
former, one would obviously have to have at least a theory of seman- 
tics that systematically related language in passages to situations; no 
h theory now exists. For the latter, one would require a systematic 




' ry for relating one complex physical situation to another — another 
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nonexistent theory. In the final analysis, then, performance tasks have 
only apparent pragmatic validity, but there is very little prospect that 
their actual pragmatic validity can be demonstrated for the class as a 
whole. 

The second limitation (as was mentioned above) of the per- 
formance item is the rather obvious one of expense. 

The third limitation is a severe one. It is impossible to use 
the performance item to test the full range of literacy behaviors. A 
substantial amount of language is used to refer to impossible and 
unobservable events and objects, such as The elf thought hard about 
the loss of magical powers or God is a diserrtbodied power; and some 
language refers to observable but extremely abstract notions such as 
The search for truth is the quest for power. It becomes difficult to 
imagine a way the performance task could be used to assess a person's 
literacy with respect to printed language of this sort. So unless it could 
be shown that the processes underlying responses to statements of 
these types were identical to the processes underlying responses to 
statements about real and concretely observable things, it must be rec- 
ognized that performance tasks are applicable only to language that 
deals with concrete and observable things. 

Evaluation of verbal questions. The verbal test question 
seems to escape most of these problems. It seems entirely possible to 
determine experimentally the pragmatic validity of verbal items. It is 
possible to develop algorisms that produce whole populations of items 
(Bormuth, 1970) in such a way that it is possible to either generate 
or select unbiased samples of items, and it is therefore possible to 
conduct experiments to determine the pragmatic validity, of this type 
of item. And it was. also argued in the sam ; source that it is at least 
conceptually possible to develop similar definitions for any verbal 
question that is relevant to a passage. 

On first analysis the verbal test question seem.? > o involve a 
circularity that has an undesirable effect on certain clat^js of ques- 
tions. The verbal task tests a person's responses to languagr^ merely by 
glaring him a 'question that is also language and then observing his 
response, which is still more language. At no time is it necessary for 
the person to make a response to the objects and events referred to by 
that language, demonstrating that he actually understood it. 
O . That this is not only a possible but even fairly common phe- 
JCnenon can be seen from a consideration of these sentences: 
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1] All daxes have wobs. 

2] We have daxes in our dorf. 

3) Do we have wobs in our dorf? 

41 What has wobs in it? 

51 Who has a dorf? 

Although questions 3, 4, and 5 are fairly easily answered by most 
speakers of English, one could hardly say that they understood sen- 
tences 1 and 2, since several of the lexical morphemes in them were 
in fact nonsense syllables. 

However, there are many classes of verbal questions that can 
be defined, and this effect seems to be limited only to a few of those 
classes. Consider, for example, this sentence set: 

6] The youth mounted the steed. 
71 Who climbed on the horse? 
8] Who mounted the steed? 

It is much less likely that verbalism could occur on 7 than on 8, and 
each of these represents different classes of items that can be rigor- 
ously defined Anderson (1973) has explored the evidence on this 
matter in some detail. Consequently, while the apparent pragmatic 
validity of some types of verbal questions may be questionable, those 
classes of items can be defined and separated from those classes of 
items that appear likely to be shown to have acceptable pragmatic 
validities. 

The verbal question is fairly inexpensive to construct and use. 
This is not to say that the verbal questions generated by just any 
speaker of English would suffice for testing literacy behaviors; it re- 
quires the skills of a person highly trained in linguistics and item- 
writiug th^ry to prepare acceptable items. Nor is this the claim that 
we already know how to write every type of item th;it might be em- 
ployed in literacy definitions. To reach this point of development will 
require considerable investment in research. Rather, it is simply the 
claim that the verbal question will generally cost less to prepare and 
use than its major rival, the performance item. 

The verbal question is further recommended by the fact that 
it is equally applicable to all language. Questions axe, in fact, nothing 
more than transformations on the syntactic and semantic structures 
underlying the language in passages. Sentences 4, 5, 7, and 8 are 
examples of questions derived through applying semantic and 




aetic transformations to the sentence to which each, respectively, 
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is relevant. Number 3, on the other hand, is derived from the syntactic 
relationship underlying and connecting the continuous discoui'se rep- 
resented by sentences 1 and 2, Some of the details of these question 
transformations have been examined elsewhere (Bormuth, 1970). 

Finally, regarding verbal questions as transformations on the 
structures of the language in a ir.xi provide^ the verbal question with 
numerous advantages. Of greatest importance in the immediate con- 
text is the fact that these question derivation transformations enable 
us to give exact definitions of classes of items* and subsequently to 
use ihese definitions of item classes to give equally exact definitions 
of literacy behaviors. With respect to the performance item, it is ex- 
tremely difficult to define classes of items in an exact manner because 
doing so requires that we possess well-developed semantic theories and 
theories that relate physical situations to each other — theories that are 
presently so poorly developed as to be ahnost nonexistent. As a result, 
it is impossible to say with objective certainty that 2 different perform- 
ance items are members of the same class or of different classes. And 
when one cannot even say that 2 collections of items are at least for- 
mally different, there is no logical justification whatever for claiming 
that the mental processes tested by each population of items are in 
some respect homogeneous within the populations and systematically 
different from the processes tested by other populations. In the case 
of the verbal question, however, differences among classes of items 
can be denoted by differences in the transformational procedures by 
which they are derived, thereby providing at least the first logical 
basis for operationally defining different classes of literacy behaviors. 
Moreover, there is now Sirong evidence that the classes of questions 
that are generated by syntactic transformations do, in fact, test homo- 
geneous categories of behavior (Bormuth. et al, 1970). 

One implication of this last statement is that the rationale 
and technology that underlies all educational test writing falls short 
of what might be considered scientifically acceptable. Thus, the ben- 
efits of treating verbal questions in a scientifically acceptable manner 
can be attained only after considerable effort has gone into the re- 
search necessary to lay the scientific base for the required technology. 

4. Th« author (1970) iugg^^^ (h^t quetlion Iranifoniialions deAned by 

Chofntliy and othen could serve as a protorype for these de6nj lions. However, this 
^* poaal turned out to encounter several difRculiies l>ecause of deficiencies in tranafor- 
Cp 1/^ itional grammar. Finn ( 1973) has since found thai algorisms based on a case gram- 
Ll\L^ r teem to overcome most of these problems 



Definir-^ and assessing literacy bormuth 



35 



Tests made by traditional procedures 

Having noted the verbal question as the best mode of testing 
literacy behaviors and having acknowledged that its value is only 
potential because the methods by which it has traditionally been 
made are not reliant on the rational. procedures of a science, we should 
examine traditional test making procedure s and some of the problems 
that grow out of them. 

Traditional test-writing procedures. The traditional method 
of writing tests involves 4 steps (Bloom, et ai, 1956). First, the test 
writer lists each of the mental processes lie wishes to test. These proc- 
esses form a set of column headings in a table or matrix. Second, he 
lists all of the different types of subject matter he perceives as being 
taught by a passage and that he wishes to test. This list is placed in 
the left hand column of the table, and each item on the list serves as 
a row label. This forms a table of the type illustrated in Table 1, 
where the items of content are represented by the symbols C,, C,, . . . , 
C„ and the mental processes are represented by the symbols P,, P,, 
. . - . P.,. Third, he then attempts to write for each cell of his table the 

Table 1 Illustration of a test: writer's matrix 
Mental Process's 

P. P, P., 



c, 


C,.P, 


C,.P. 


Content 


C,.P, 


C .P, 


Items 








C„J>, 


C».P, 









C,.P.. 



type of item that permits him to test a person s knowledge of a given 
item of content by having him exhibit it using whatever mental proc- 
serves as the column heading. For example, suppose that P, stood 
j^j^Q the mental process involved in comprehending the main idea of a 
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paragraph and stood for content dealing with the structure of 
atoms; then the item written for cell C, P, would be written in a 
manner that appeared to the test writer to force a person to demon- 
strate his ability to comprehend the main idea of a paragraph that 
dealt with the structure of an atom. The test writer is not provided 
with any definite set of operations for deriving these items. Rather, 
most writers on this topic (see Davis, 1964, p. 262, for example) re- 
gard the actual formulation of the item as a quasi-artistic endeavor. 
Finally, a jury reviews his work to see if they agree with it or if he 
needs to revise it. 

Lack of operationalism. This conceptualization of item writ- 
ing laid the basis for all modem test theory. And particularly impor- 
tant was the insight it gave us into the dual nature of the test item. 
That iS, an item rOi only tests knowledge of some information; it also 
tests a person's competency to perform the processes necessary to 
derive that information from his instruction. However, it left a num- 
ber of problems to be resolved. In one way or another, virtually all of 
the criticisms that can be leveled at this procedure grow out of its 
heavy reliance on the personal judgments and intuitions of the test 
writer. Or stated another way, the criticisms grow out of the absence 
of operationalism of the procedure — the absence of specific instruc- 
tions for carrying out each step. 

The test writer is told that he should test a mental process 
only when it is appropriate for the passage, but he iz never told by 
what rules one decides if it is appropriate. The test writer is also told 
to write items that test those mental processes, but he is never told 
what the form of those items may be. And he is told to list the con- 
tent topics he thinks the passage deals with, but he is never given any 
instruction on how to identify these topics or on how grossly or nar- 
rowly he should analyze these topics. 

As a result of this looseness in the procedure, it seems doubt- 
ful that a test made in this way could meet the ordinary requirement 
of operational replicability, which is imposed on all activities laying 
claims to scientific status. Before an activity can be regarded as useful 
for making verifiable statements, we ordinarily demand that it be 
operadonalized to the point that others working independently can 
oerform the same operations and verify the results. 




Somewhat the same demands are placed on the evaluation of 
;rams that employ public funds and represent matters of public 
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policy; only in this context they are phrased as the demands for 
accountability, undcrstandabillty, and freedom from personal bias. If 
2 test writers cannot independently replicate each other s work — and 
it is extremely unUkely that they could — it becomes immediately 
apparent that the concepts of the mental processes, the subject matter 
topics, and the question-writing procedure mean different things to 
each test writer and are therefore not expressed in a form that is 
understandable and can be communicated. And far from being impar- 
tial, the results on such tests must be regarded as biased by whatever 
test writer happens to prepare the tests. 

Inversion of the validity question. The traditional approach 
to item writing takes a pecuUarly inverted approach to the question of 
what mental process is tested by a given item. It simply assumes that, 
if the test writer and his jury agree that an item measures a particular 
mental process, then that is, ipso facto, what the item tests. It does 
not view this as a matter that should be established by scientific pro- 
cedures in which one would set out to isolate a process and study its 
nature, but rather as a matter to be settled only by a fiat of the test 
writers. Consequently, the labels on tests developed by traditional 
methods are highly suspect. The implicit claim that they make cannot 
be verified, because the test writer is permitted to use whatever label 
he feels is appropriate; and the lack of replicabllity of the work of any 
test writer who uses traditional test>writing methods shows that the 
application of these labels is highly idiosyncratic, if not actually arbi- 
trary. Again it can be seen that tests made by these procedures have 
impaired value for use in scientific analyses. Similarly^ it can also be 
seen that these tests cannot be taken as impartial evidence of the 
effectiveness of instructional programs or evidence that is used in 
making decisions that influence the lives and wealth of people since 
the results are likely to reflect the conscious and unconscious biases 
of the test maker. 

At this point it may be appropriate to note that test writers 
themselves have long been aware of and concerned about the problems 
inherent in their procedures. But they have also been faced with the 
urgent ongoing need for test| in the schools. Consequently, they have 
had to do as well they c(fdld using methods which are less than 
ii/^itific until some way was found to develop better test-writing 
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Operationalizable test-writing procedures 

While verbal questions made by traditional procedures are of 
dubious value, this is not a property of the item itself; rather, it is 
merely a property of the way it is derived. TThat is, the items ordinarily 
produced by traditional test-writing procedures are good items in the 
sense that they do test some sort of behaviors that at least intuitively 
appear to be important behaviors. What is required, however, are 
item-derivation procedures that can produce populations of items in a 
repJicable fashion. There are now 2 such procedures that may be use- 
ful for testing literacy behaviors — the cloze and the wh- question 
procedures. 

Cloze procedure. The cloze procedure is a way of making tests 
by mechanically deleting the words in a passage of written language 
and replacing each with an underlined blank of a standard length. 
People taking the tests are expected to guess what word was taken 
out of each blank and write it in that space. There are a variety of 
ways to select the words to be deleted. One can delete ever>^ Nth word, 
every second noun, all adjectives, and so on. What distinguishes a 
cloze test from an ordinary deletion test, though, is the fact that in 
a cloze test the words may be selected for deletion only by a completely 
replicable set of rules. Using introspective concepts like key words is 
ruled out. 

The advantages of such a procedure are primarily that the 
tests made in this fashion are completely replicable, making true 
validity studies possible. One can define the population of all items 
where predicate adjectives are deleted. And this makes it possible to 
draw a random sample of such items, to study their properties and 
then to attribute the results to that population of items. It can also be 
claimed that items made by these procedures do not reflect the biases 
of the test maker. In these respects the cloze procedure satisfies some 
of the most basic requirements necessary for an acceptable test of 
literacy. 

However, cloze items have a rather serious shortcoming, be- 
cause it is difficult to relate the items to the theory of language com- 
prehension. In this theory (Osgood, 1963; Mowrer. 1954) compre- 
hension is regarded as taking place by a series of events through 
which the meaning of one word or phrase is combined with or modi- 
^be meaning of another word or phrase. And the character and 
of these modifications is regarded as being controlled by the 
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syntax ot the text. Thus, in sentence 9 the word boy might be thought 
to be modified by the, hill by the, climbed by the hill, and the boy by 
climbed the hill in that order. Ordinary verbal questions that begin 



a 

— 



b 

— I I r 



d 

1 



91 The boy climbed the hilL 

with wh' words like who, what, where, and so on can be directly re- 
lated to this theory since, unlike the cloze procedure, whole words, 
phrases, clauses, and even sentences may be deleted. For example 
What did the boy do is derived by deleting the whole predicate climbed 
the hilL And thus the verbal question can be regarded as testing asso- 
ciations at each of the various points at which modifications occur in 
a sentence. The question What did the boy do tests the modification 
marked a in sentence 9, the question What did the boy climb tests the 
modification marked c, and modifications b and d cannot be tested, 
presumably because they primarily carry syntactic information rather 
than semantic information. 

On the other hand, because only single words are deleted in 
the cloze procedure, only the lowest level modifications can be directly 
tested.* And some of these may primarily test structural modifications 
as when the word the is deleted. Possibly the most serious disadvan- 
tage of the cloze test is the fact that the individual test items are diffi- 
cult to interpret. When we use questions, it is a fairly simple matter 
to relate the test item to the structure of the text and thereby inter- 
pret what process the question tests. An inspection of cloze items, 
however, shows that responses to most of them depend on a variety 
of processes, and it is difficult to identify those processes. However, 
reviews of the rather extensive research hterature on this type of test 
seem to show that what cloze tests measure is indistinguishable from 
what is measured by ordinary comprehension questions, 

5. Some people h%ve interpreted this fact to indicate that the cloze procedure tettt 
only the thort-range constraintf in a passage. And this can be eqttated to tedtiui only 
Q limpless factual infomation in the passage. This interpretation, however, lacks 
P D I /^"lupport in research. It is true that the short range constraints have a powerful 
t on the response (McGinitie 1960), but they are by no means sufficient to fully 
™^SP?-iin cloze responses (Taylor, 19S4). 
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\Vh' questions. As just noted, the verbal questions that gen- 
erally begin with ii h- phrases such as who, what kind o}, what did, 
and so on do have the desirable property that they can be related 
directly to the theory of comprehension. And, although in the past it 
was possible to derive them only by traditional test-making methods, 
it is now possible (Finn, 1973) to derive them by using procedures 
that make it possible for one test maker to independently replicate 
another s work and for one to give precise definitions of a number 
of populations of items. 

TTiis is accomplished by regarding the question as being de- 
rived from the language in a passage through a set of semantic and 
syntactic transformations. For example, each of the questions men- 
tioned in the paragraphs immediately above can be derived by a set 
of transformations that can be crudely described as deleting one 
branch of a modification, replacing the deleted branch with an appro- 
priate n h' phrase, and then, if it is not already there, shifting the 
u h- phrase to the front of the sentence. This description, of course, 
neglects many of the details of the transformations; but what is 
important is the fact that it js posf;ihl€ to devise ndes that exactly 
describe how each of the various classes of xih- questions can be 
derived. And this fact makes it possible to state that wh- questions 
derived in this way seem to have all of the basic properties necessary 
in order to develop them into fully satisfactory tests of literacy beha- 
viors, such as ability to get the important points, ability to get t)ie main 
ideas, ability to comprehend the structure of the knowledge or mate- 
rial, and so on. The definition of such items depends on being able to 
assign a syntax to a passage that connects its sentences and larger 
segments of discourse to each other in an explicit and replicable fash- 
ion. Such an analysis seems feasible to develop at this time, and some 
segments of it have been developed. This syntax is then used to define 
various classes of questions. Bart's ( 1970) work strongly suggests that 
it is also possible to employ the syntax of Aristotelian logical algo- 
rithms to texts in order to defme classes of items that test what have 
been known in traditional terms as the inference skills and many of 
the critical reading skills, This recent research provides fairly good 
grounds for the claim that all important tests of literacy behaviors 
may eventually be derived by this type of procedure. 
Q However, one fact should be made plain. The great benefits 
D IP" can be attained through this procedure of test writing cannot be 
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arrived at without a very considerable amount of research, and this is 
a type of research that the educational research community is only 
partially prepared to undertake. The research deals primarily with the 
calculi of linguistics and logic, areas of research for which only a few 
educational and psychological researchers have been trained. Conse- 
quently, getting research of this type under way will necessarily in- 
volve a considerable amount of interdisciplinary research and training 
programs. 



Criterion of literate performance 

The second parameter that must be specified in a definition 
of literacy is the level of performance a person must exhibit on a test 
before he may be regarded as literate. In literacy assessment we wish 
to perfomi binary classifications of people as being either literate or 
subliterate. And being able to do so as an either/or classification is 
vital, for we can then use the peison s classificaiion for making deci- 
sions that are important for him and for the society as a whole. When 
the individual becomes literate, he can stop using up irreplaceable 
fractions of his life learning literacy skills and turn those skills to 
more directly productive activities. And society can stop spending 
money for expensive instruction in literacy skills and turn its re- 
sources to other tasks. In both cases the criterion should provide a 
rational procedure for deciding to terminate one type of activity and 
to commence different activities. Unfortunately, groups of people have 
an annoying tendency to exhibit a continuous range of scores on tests 
rather than a tendency to fall into 2 well-separated clusters of scores. 
Consequently, there is no natural or immediately obvious way to make 
the binary decisions required. 

Problem 

Thus it can be seen that what we are really dealing with at 
this point is the classic problem of Hoiv good is good enough7 — where 
goodness is measured along a continuous scale having no natural 
boundaries that would facililate metrically clean and logically neat 
binary decisions about when a person's performance on bteracy tasks 
is good enough to warrant his being labeled as literate. Or, stated 
O rationally, the problem is to assign social values to test scores and 
JCn to identify the score that has the greatest value to a person. This 
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problem can be solved when it is properly conceptualized. But first it 
may be instructive to look at previous approaches to its solution. 

Earlier criterion scores. At various times a number of crite- 
rion scores have been advocated. Each has been the subject of some 
scepticism for both technical and philosophical reasons. Let us begin 
by noting 3 of the better-known criterion scores. One of the earliest 
and most widely used criterion scores was proposed by E. L. Thorn- 
dike ( 1917). He recommended that a student should be able to answer 
at least 75 per cent of the questions on a test made from a sample of 
an instructional material, if that material were to be considered suit- 
able for use in that student's instruction. Instructional programmers 
sometimes employ the so-called 90-90 criterion wherein they attempt to 
revise and improve their materials until 90 per cent of their students 
can answer 90 per cent of ihe questions given them at the termination 
of instruction. Finally, some have interpreted (probably erroneously) 
the writings of Bloom (1968) and Mayo (1970) as advocating that a 
student's instruction in a body of content be continued until he can 
demonstrate perfect performance on a test of that content. 

Nearly all proposals of criterion scores, including these 3, 
have assumed that tests made by the traditional procedures would be 
used to measure attainment of the criterion level of performance. 
Lorge (1948) pointed out that the absolute magnitude of scores on 
tests of this type are usually subject to biases introduced by the ideo- 
syncracies of the people who wrote the tests. Test items testing dif- 
ferent content and processes often differ widely and systematically m 
difficulty. In addition, even slight variations in the phrasing of a test 
item can sometimes lead to wide variations in the difficulty of the 
item. Thus, a test writer can have a great deal of influence on the 
difficulty of the test. In traditional test-writing methodology, the test 
writer is only partially conirtrained with respect to the content and 
processes thai he may test and almost completely at liberty to phrase 
his items to suit his personal preferences. Hence, 2 difFerent test 
writers might be expected to produce tests of quite difFerent difficul- 
ties to te*it exactly the same instruction. Thus, Lorge reasoned that 
these criterion scores do not represent a standard level of competence. 

Few advocates of a criterion score have advanced a rationale 
to justify their preference for the scores they chose. Bloom (1968) 
'"^•^ars to have been the exception. He pointed out that students 
[)|(^lred to reach a high (but not necessarily perfect) level of per- 
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formance on each unit of content tend to exhibit similar levels of final 
achievement, to overcome initial deficiencies in aptitude^ and to mas- 
ter succeeding units of content in less time. Bloom did not specify a 
particular level of performance as a criterion. He simply advocated 
using a high level of mastery as a criterion. However, many have 
interpreted his concept of mastery to mean perfect performance. And 
a critique of this misinterpretation is highly relevant to the problem 
we are considering here. 

There are at least 3 reasons why perfect performance is 
unhkely to be considered the most desirable level of performance. 
First, using such a criterion is likely to drive some of the costs of 
instruction to preposterous levels. Almost every learning study shows 
that learning increases rapidly on the first few repetitions of the mate- 
rial and then flattens almost to the horizontal well before perfect 
performance is reached. Hence, attempting to reach perfect perform- 
ance is likely to be a time-consuming and expensive undertaking. 
Second, since efforts to reach this criterion are likely to involve much 
repetition and drill, we could anticipate adverse effects on the stu- 
dents' attitude toward the content of Instruction. That is, students find 
much repetition boring and unpleasant, and they could transfer those 
attitudes to the content and thereafter try to avoid its further study 
and use. Third, attempting to reach perfect performance on a unit of 
content implies that all of the items of content in that unit are essen- 
tial to learn. This may be true of a few isolated units, such as that 
dealing with the multiplication tables, but most units of content deal 
with collections of content stciiis that differ greatly in their utility to 
the individuals 

Reformulation as a rational problem. This critique of the 
criterion of perfect performance now puts us in a position to refor- 
mulate the problem of identifying criterion scores. The authors of 
most criterion scores stated a preference for a particular score but 
failed to support their preference with reason and evidence. This 
seemed to reduce the problem to simply making an arbitrary choice 
of a score. However, we have just seen a case in which plausible 
arguments were oflFered in favor of setting a high criterion score and 
equally plausible counter-arguments were advanced against setting 
the score too high. Hence, the problem will clearly submit to rational 
sis and formulation. 
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Let us start with the proposition that neither literacy beha- 
viors nor any others are taught for the pure hedonistic pleasure of 
learning. Rather, they are placed in curricula because they are valued 
for the tangible and intangible things that those behaviors enable 
individuals and society to attain. But, even then their placement in 
curricula is decided only after these positive benefits have been 
weighed against the negative benefits, that is the costs associated with 
learning the behaviors. Second, an individual can acquire varying 
numbers of literacy skills, and this fact can be accurately indexed by 
his scores on an appropriately made achievement test. Third, each 
level of performance produces effects on each of the various benefits, 
both positive and negative, that he is likely to accrue. Finally, the 
problem of identifying a criterion score, then consists in finding the 
level of performance at which the over-all benefits are the greatest. 

An ideal solution 

This problem can be approached at 2 levels — either through 
an attempt to describe how one might establish a performance crite- 
rion right now, using practical procedures presently available to us, or 
through an analysis of the operations required to attain what might 
be thought of as the "ideal" solution. Both courses were elected because 
a description of the ideal solution provides a framewqrk within which 
to evaluate the adequacy of any practical solutions propo2>ed. A prac- 
tical solution will subsequently be described and developf:d in some 
detail. 

In the ideal solution to the problem of establishing a perform- 
ance criterion, one should be able to attach a cost and a benefit weight 
to each literacy behavior separately, select those behaviors whose 
learning seems likely to produce a net positive benefit for the learner, 
and then determine what level of performance on a test of these 
behaviors is associated with the greatest expected benefit to the 
learner. 

It is useful, of course, to consider the value of whole classes 
of literacy behaviors as units, because one of the major attributes by 
which we categorize hteracy behaviors is the subjective value we 
assign to the function that those behaviors ordinarily perform for us. 
Specifically, classes of literacy behaviors have not traditionally been 
^^^ned solely in terms of their psychological attributes but also in 
' IPns of their functions and how we ^''alue those functions. For ex- 
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ample, being able to identify the plot of a story seems, at least to me, 
to be no less a cognitive behavior than being able to identify the out- 
line or rhetorical patterning of an expository essay. And the 2 classes 
of behavior seem likely to share many elements in common. However, 
these 2 types of behaviors do function quite differently, :and their 
respective functions are valued quite differently. Identifying the plot 
of a story functions as a skill intended to enhance aesthetic apprecia- 
tion of hterary materials, while identifying the rhetorical pattern of an 
essay functions as a skill in more utilitarian tasks. 

But not all of the behaviors within a category are equally 
useful or equally expensive to teach. For example, some phonics rules 
apply to many different words that occur frequently in the language, 
while odier rules seldom apply; and thereby these rules can be said to 
differ in value. Similarly, the syllabication rules, which could be very 
useful in phonics behaviors, seemingly cannot be taught as fully effec- 
tive reading skills until students have learned to discriminate between 
English words of Germanic and Romance language origins, an opera- 
tion that seems likely to be so costly that no modem scholars are 
seriously advocating it. Doing so involves too great an expense in view 
of the rather limited benefits to a reader.*^ Thus, when (he individual 
behaviors of which phonics is composed are analyzed, each may be 
assigned a different set of benefit and cost values. Consequently, 
before we can decide on a criterion level of performance for a given 
category of behaviors, we must give careful consideration to selecting 
the individual behaviors tu appear in that set and the values to be 
assigned to each. 

It would be both ideal and very convenient it we could then 
simply sum these cost and benefit values acjoss behaviors to arrive at 
performance criterion scores. And, if we had complete knowledge of 
the nature of all literacy behaviors and of their relationships to each 
other, we could undoubtedly perform such an operation vrith a fair 
degree of confidence in the results. Unfortunately, at the present time 
we cannot, for example, even identify all of the literacy behaviors. 
Nor do we know how the word pronunciation behaviors relate to each 
other. It is quite commonly observed that a word is more easily pro- 
nounced when it appears in context than when it appears out of con- 

Statements about print-to-sound phonics (i.e., reading phonics) made here should 
Y^.i>e confused with or generalized to apply to statements about sound-to-prini ph.^n- 
te.y spelhng phonics). 
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text and consequently that context guessing skills interact with other 
word recognition skills, with the result that including one type of skill 
in the instruction probably influences the costs and benefits arising 
from including others. Hence, it is impossible at this stage of our 
knowledge to arrive at a performance criterion by simple summing 
operations, at least not without doing a considerable amount of 
research. 

A second practical problem stems from the fact that the lit- 
eracy behaviors have not been analyzed to the point where we can 
separately identify all of the important processes involved in literacy. 
Indeed, it is not even clear what is meant by all of the different proc- 
esses. Two processes differ when they contain either different compo- 
nents or the same components differently related to each other. And 
it is at least conceivable that we could analyze a process until we had 
identified the activities of individual neurons and the sequences of 
those neuronal activities. If we did so, we potentially would have a 
very large number of (Ufferent processes. But obviously an analysis 
thus detailed would be extraordinarily awkward for use in instruction, 
since it is neither necessary to use so fine an analysis in instruction, 
nor practically possible to operationalize the instructional and testing 
procedures of each of the different processes identified at this level of 
analysis. Consequently, deciding what literacy behaviors should be 
taught and tested depends on considerations of how far it is necessary 
and desirable to analyze the processes underlying these behaviors in 
order to obtain instruction with the desired level of effectiveness. Thus, 
it is difficult to see how we could estabhsh a performance criterion on 
each individual literacy behavior. 

A third problem that will have to be solved ls how to demon- 
strate the pragmatic validity of the items in a literacy test that is 
intended to exhaustively test a category of processes. A person is lit- 
erate with respect to a parHcular real-world reading task if he can 
perform it competently. Yet we are proposing not to observe him per- 
form that task but rather to analyze the performances required by all 
such tasks into the abstract processes underlying them, to operation- 
aUze processes using test items that differ in many respects from the 
real-world reading tasks, and then to infer that some level of perform- 
ance on our test of the abstracted processes permits us to claim that 
ould perform competently on the real-world tasks. Demonstrating 
validity of these inferences amounts to a demonstration of the 
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validities of the theories about the processes underlying the literacy 
behaviors. Hence, when this argument is pursued, it can be seen that 
it is impossible to achieve the ideal performance criterion until a very 
great amount is known s^out the literacy processes, a goal which 
probably will not be achieved without a fairly large amount of re- 
search. 

Procedures presently practical 

All this is not to say, however, ihit it is impossible to greatly 
improve present test-writing procedures. We are in fact, developing 2 
testing procedures that remedy many of the weaknesses of tests con- 
structed by traditional procedures, and we can employ these proce- 
dures a) to establish rational performance criteria for at least 2 of the 
major categories of literacy behaviors and b) subsequently to incor- 
porate these criteria into literacy assessment procedures that are sub- 
stantially better than those in current use. This section will describe 
the first step in this procedure, and subsequent sections will describe 
the remaining steps. 

Basic design of the procedure. The approach suggested here 
begins by accepting the unpleasant reality that we have very incom- 
plete theories about the processes underlying literacy behaviors and, 
therefore, that we cannot at the present time have operational proce- 
dures for testing those processes in such a manner that each process 
is individually identifiable. Instead, it asserts 1) that we do possess 
operational testing procedures that seem to test most of the word 
recognition and literal comprehension processes Involved in literacy 
behaviors, even though these testing procedures do not permit us to 
isolate each individual process; 2) that these test-making procedures 
are adequately operational for establishing rational performance cri- 
teria because they do not permit test writers to bias the tests; and 
3) that these rational performance criteria can be incorporated into a 
literacy assessment design that will produce results that, though ad- 
mittedly short of ideal, are more behevable than any produced by the 
traditional methods of assessing literacy. 

In the approach that seems practical at this time, the crite- 
rion functions somewhat differently than in the approach already 
described. Instead of attempting to abstract underlying processes and 
,^^4ttach a value to each one separately, the approach suggested here 
poses to use testing procedures that seemingly test a variety of 
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reading processes. The reading tasks are selected to have direct prac- 
tical significance, and then research is conducted to identify the most 
desirable score tor people to attain on these tests. It will be noted that 
this approach requires, at the very least, both operational test-making 
procedures that can be applied to any passage and that do not permit 
the idiosyncrasies of test writers to bias the results, and a theory for 
deciding what is the most desirable score (the performance criterion 
score ) on any test made by this procedure. Thus, in this approach one 
simply attempts to develop procedures that determine whether a per- 
son is literate with respect to a specific, important, real world reading 
task. When this has been done, we can then employ this operation to 
determine any person's literacy with respect to any set of real-world 
reading tasks. 

Criterion identification model. Now, let us consider the prob- 
lem of how* one decides which score on a test is the most desirable 
score This topic is necessarily abstract. However, it can be made more 
comprehensible by illustrating it with data from a series of studies 
ihtii 1 am currently conducting. The objectives of this study are 1 ) to 
develop a rational model for identifying criterion scores, and 2) to 
employ the model to identify criterion scores for use in interpreting 
scores on cloze tests that are employed for evaluating the compre- 
hensibility of instructional materials. 

The reasoning behind ^his model is fairly simple. People read 
because doing so -produces several effects on them, and they value 
those effects. Hence, if one of the scores on a test made from a pas- 
sage is any more desirable than any of the other scores, that score is 
more desirable because it is normally associated with a greater total 
value arising out of these effects. To be a bit more specific, the value 
) of a t;iven store ) on the criterion test (C), in this case a cloze 
test, is given by 

V(C.) - (w, .£,) ^ Cw, E,) 4 . ^ (w\ E,) 

where w, stands for the value we place on effect number one, and 
E, stands for the amount of effect of type one that we normally expect 
to find associated with cloze score i. The value derived from each indi- 
vidual effect is obtained in exactly the same way that we calculate the 
value in any other accounting problem — we multiply the number of 
„«j*c of E, that we obtain by the average value (w, ) of each of those 
^Rir Th^s, this model simply claims that the value of a given test 



Defining and assessing literacy bormuth 



49 



score is the sum of the values of each of the individual efFects that we 
normally expect to find associated with that score. The criterion score 
for the test is the score having the greatest summed value (V). 
Note that this model is quite general and can be applied to any test, 
regardless of how the test is made and regardless of what that test is 
measuring. 

Seven steps are required in order to apply this modf*l. First, a 
criterion test (C) must be selected: This illustration employed the 
cloze readability test. Second, the associated effects (E,, E,, etc.) must 
be identified and tests must be developed to measure them. These 
include both negatively and positively valued effects. Third, the crite- 
rion test and the measures of the other effects are administered in a 
manner that permits each effect for a person to be associated with his 
score on the criterion test. Fourth, the average amount of each effect 
is calculated for each score on the cloze test. Fifth, the people who are 
best able to estimate the relative values of each of the effects are iden- 
tified, and we obtain the average of the values that they assign each 
effect. Sixth, each effect is multiplied by its value. Seventh, for each 
cloze score, the values of the individual effect? are summed. Eighth, 
the cloze score having the greatest value is found and designated as 
the criterion score. 

Application of the model. In the study' that will serve to 
illustrate the application of this model, 4 effects were identified : Infor- 
mation Gain, Rate of Reading, Conceptual Difficulty, and Wlllingness- 
to-Study. Information Gain was measured by testing a person's knowl- 
edge of a passage both before and after he had read it and then 
calculating a residual gain score for him. The tests used to do this 
were made by a fairly operational procedure that involved using the 
sentence and intersentence syntax of a passage to transform the text 
into questions and then drawing a stratified random sample of the 
questions that resulted. Rate of Reading was measured by having a 
student read a passage, noting how much time it took him to complete 
the task, and then calculating the number of words read per minute. 
Conceptual difficulty was measured by having the student read a pas- 
sage and then rate it on a 7-point scale in which the extremes were 
labeled much too easy and much too difficult. The scale was then 

Q Jhii study empluycd a very complex detifn that involved the use of pattafet of 
pD ir^ied difficulty. cuunterbaUnced rotations of materials> and the like. If the reader 
„.jjr^l! h# can find the details in a preliminary technical report (Bormuth. 1971). 
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folded so that the scores ranged from 1 to 4, from much too easy-diffi' 
cult to about nght. Willingness-to-Study was measured by having the 
student read and then rate a passage on a 7-point scale that ranged 
from like very much to dislike very much.^ 

The value of each effect was determined by havmg the teach- 
ers of these students rate the behaviors. Tlie teachers were given a 
description of each of the tests and asked to rate the relative values 
of the behaviors on a 10-point scale. Tlie average rating given each 
behavior was then used in the model. 

A sample of the results is shown in Figure 1." This figure 
might have been built up in this way. Consider a cloze score of 50 per 
cent, for example. The average Information Gain scores of students 
who obtained 50 per cent was calculaied' and then multiplied by the 
weight the teachers had given This yielded a value for that effect of 
about 11. (This number has only a relative meaning.) This point 
could then be plotted and it would fall on or very near the lowest curve 
on the graph, the curve labeled Information Cain. This same process 
could next be repealed using the mean rate score and the weight 
assigned to that variable. This amount of value could thereupon be 
added by measuring up from the point on the Information Gain curve. 
This would produce a point that falls on or very near the second curve 
labeled Rate. The process could then be repeated for each of (he re- 
maining 2 measures to obtain points that fall on or near the 2 upper- 
most curves. The point on the top line of this plot represents the total 
value a reader could normally he expected to receive from a passage 
if he uerc able to obtain a cloze score of 50 per cent on that passage. 
By repealing this process for each of the other cloze scores and con- 
necting the points obtained for each effect, we would obtain the curves 
shown in this figure. Since the top curve in this graph represents the 
total value associated with each cloze score, (he point at which that 

8 KKphcit instructions accompanied these rating scales. This study used teits made 
from 32 passages representing B levels of difficulty and administered those tests to 
1,600 students* who were drawn in equal numbeni from grades 3 through 12. 

9 It should be cautioned that this figure grossly over simplifies the results. The 
curves for each of the effects differed with the grade levels of the students, and the 
weights assigned to each of the effects and the curves for the rating scales differed 
when ithey were analyzed according to the type of reading assignments in which the 
given materials was supposed to be used Therefore, the criterion score that can be 
identified with this figure will vary somewhat from the criterion scores that are actu- 
ally appropriate for students at various grade levels who are asked to perform various 
kinds of reading tasks. 

Yyr^-''^* scores for all of the effects measured were transformed into standard scores 
R ir^ iove arbitrary scale effects. 
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curve reaches its highest point (at a cloze score of approximately 45 
per cent) represents the criterion score. 
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Formal model In a general discussion of this sort it was 
undesirable to go into much detail on matters such as the rationale of 
the model, the instrumentation of the tests, the design of the studies 
or the treatment of the data. These will be made available in future 
pablications. However, it may be desirable to present the exact form 
of the model for the reader to evaluate. It is given by the expression 

V(C7B.,B„. . .,Bb) - Cw, .f,fC.)) -f 
(w, f,(C.)) 4 ... + (w, f^(C.)) 

In this expression 

V = total value normally associated with a given score on the crite- 
rion test; 

Ci ^ the score i on the criterion test C, in this case the cloze test; 

Bb the boundary conditions number b. Boundaries must include 
those faciofs that define the domain within which criterion score 
is applicable, factors such as the age of the student, the subject 
matter of the material, and the purpose for which the material is 
read; 

V(C,/B,, B, Bh ) -- the total value. V, normally found to be 

associated with the score i on the criterion test C, within the 
boundary conditions B,, B.., . . . , B^; 

Wx ~ the relative weight normally assigned to e'^/ect x, effects such 
as the person s average expected income, the proportion of news 
editorials that the person is likely to be able to read as a desirable 
level, etr.; 

fuCC.) - the amount of effect x associated with the score t on the 
criterion test as given by the regression of that effect on the crite- 
rion test. This measure is expressed in some standard form to 
remove arbitrary scale effects. 

Four characteristics of this model should be noted before we 
leave the subject. First, almost any kind of test can be used as the 
criterion lest, including the typical standardized achievement and 
aptitude tests. The metric and content of those tests is fairly arbitrary 
in the sense that the items contained in them are selected primarily to 
produce a particular set of metrical characteristics for the test and 
only secondarily to test a "representative" sample of some body of 
content. However, only operationally defined tests may be used to 
measure what we have been calling the effects in the model. A crite- 
gr icore is normally generalized to a population of instructional 




li and this cannot be done legitimately unless we can be assured 
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that fairly comparable measures were applied to the sample of stimuli 
Originally used to identify the criterion score. Second, the model is not 
biased by including": irrelevant effects. An effect may be irrelevant 
either because it is unrelated to thg criterion measure or because 
people place a zero value on the effect. In either case, the crite- 
rion score identified will be unaffected by including an irrelevant 
effect. Third, the validity of a criterion score identified with this 
model diminishes, depending on the proportion of the relevant effects 
included in the model, on how highly the excluded effects correlate 
with the criterion, and on how much value people place on effects 
excluded. Finally, it should be noted that the model in its present forrn 
does not attempt to reconcile the trade-off between future and imme- 
diate benefits. But since the accounting procedures for doing so are 
Well known and since those procedures were not particularly relevant 
at this point in the discussion, no effort was made to incorporate them 
into the model. However, it should be noted that they are relevant for 
other uses of this model. 

Identification of corpuses of tasks 
and a corpus criterion 

The third parameter that must be described In a definition of 
literacy is the kind of language with regard to which people should be 
literate. The goal of a literacy program would be hopelessly vague 
unless the definition on which it was based contained such a descrip- 
tion. Language is used for many areas of discourse and for many dif- 
ferent purposes within those areas. The language employed in each 
area differs materially in vocabulary structures, in sentence r pic- 
tures, and in discourse structures; and each of those different rruc- 
tures presumably requires a different literacy process or level skill 
to cope with it. Thus, if a definition failed to specify the corpus or 
population of reading tasks it dealt with, it would implicitly commit 
the program to deal with all possible corpuses of discourse — a task of 
overwhelming magnitude. 

In view of the facts that some omissions must be made for 
practical reasons and that the omission of literacy skills from a lite- 
racy program can have important social consequences, the position 
taken here is that a literacy definition is both dangerously vague and 
Q illy incomplete unless it specifies the types of language with 

ERJC 
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respect to which people are expected to demonstrate literacy. There 
are 2 aspects of language population selection that must be discussed. 
The first is a discussion of what criteria should be used to select lan- 
guage populations in order to reflect accurately the values of society 
and the individual. The second is a discussion of a procedure for estab- 
lishing a criterion score for determining if a person is hterate with 
respect to a population of written language. 

Criteria for selecting corpuses 

The procedure for selecting corpuses of language and the 
rationale for that procedure must be made as explicit as possible. 
Literacy skills may be employed in a number of important poHtical, 
social, cultural, and economic activities in our society; and having or 
not having those skills has direct implications for a person's rights, 
responsibilities, and opportunities to participate in those activities. 
For example, it is not unconmion to hear that persons and even whole 
groups of people vote for candidates who are actively working against 
the best interests of those people or of people who failed to receive job 
promotions because they could not acquire the information necessary 
to carry out their new duties. Since situations of this kind are trace- 
able, at least in part, to those people's failures to acquire certain 
literacy skills, it must be recognized that both society and the indi- 
vidual have the right to know exactly what literacy skills are selected 
or excluded from instruction and the reasoning by which these deci- 
sions were made. Or, stated another way, these decisions cannot, as 
they have been up to the present, be left obscure by allowing them to 
be treated as the creative acts of individual teachers or as the unex- 
plained technical decisions niade by. publishers of instructional ma- 
terials. 

This matter has not previously received the careful analysis it 
deserves, and the present discussion will merely pose the problem and 
demonstrate 'the need for its further analysis. At least 5 criteria seem 
relevant to making decisions about whether to include or exclude a 
corpus of reading tasks: 1) monetary cost, 2) economy of time, 
3) value-achieving utility. 4) commonness, and 5) frequency. 

The first criterion that must be considered is how much 
money is available for teaching literacy skills and how much it will 
O to teach each person enough literacy skills so that he is Hterate 
y v> i respect to a given corpus of reading tasks. While the monetary 
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cost involved in literacy programs does not in any sense represent our 
highest value in these matters, it nevertheless, sets rigid bounds 
within which our other values may be achieved. At the present tiiHe, 
it is not clear how one might go about obtaining an accurate estimate 
of the cost of instructing people in literacy with respect to a particular 
corpus of tasks. The cost would depend not only on the nature of the 
tasks, but also on what other tasks were being taught and the se- 
quence in which they were taught, because there would undoubtedly 
be major transfer effects among different corpuses of tasks. However, 
it is clear that this criterion warrants very careful attention because 
the resources expended developing literacy for one class of tasks 
necessarily preempts the use of those resources to develop literacy on 
other tasks. 

Second, we must consider the amount of instruction time that 
must be devoted to achieve literacy on a corpus of tasks. Perhaps 
because instruction usually produces large long-range benefits for an 
individual and because education in the past has been under^sup- 
ported, we tend to think of instruction as a general good that we can 
never get enough of. But this notion must be carefully examined, for 
instruction inevitably consumes a substantial part of a human being's 
most valuable and irreplaceable resource, his life. With a major and 
growing proportion of our population now enrolled in formal school* 
ing for as much as a quarter to a third of the average human life 
span, educators cannot for much longer continue to treat the time 
spent in instruction as if it were a valued but essentially inexhaustible 
commodity. Hence, we cannot use this resource indiscriminately for 
teaching literacy skills, but must ask whether the benefits derived 
from literacy instruction on a class of tasks really warrant the expen- 
diture of time when we consider the other hteracy skills the student 
could have been acquiring and, for that matter, the other educational 
and noneducational activities he could have been engaged in. Making 
the time estimates required for applying this criterion promises to 
present problems that are similar to and at least as complex as those 
involved in making monetary cost estimates. 

The third criterion is the value-achieving utility of acquiring 
literacy skills on a set of reading tasks. Each corpus of reading tasks 
can be employed to achieve some sort of social, political, cultural, or 
O jnomic values for a person. These values should determine assigned 
sJjLights that correspond to some consensus of their relative impor- 
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tance, and each class of reading tasks should then be evaluated and 
ranked in terms of these values. 

The fourth criterion is the degree to which all people must 
deal with the ciass of reading tasks — the commonness of the task. A 
considerable degree of economy may be achievable by separating those 
tasks on which everyone should be literate from those tasks associated 
with special occupations and hobbies. Only those tasks that are com- 
monly needed by all should be included in the definition used with a 
basic literacy program to be conducted for everyone. The specialized 
tasks may then be included in definitions of special literacy programs 
designed for those who seek specialized training. 

The fifth criterion, the frequency with which a type of task is 
encountered, appears to have a dubious value for the selection of cor- 
puses of reading or literacy tasks. According to this criterion, one 
would assign each class of tasks a value corresponding to the fre- 
quency with which a person must deal with the tasks of that kind and 
then select just those classes of tasks that occur most frequently. For 
example, by this criterion one would select the tasks involved in read- 
ing newspaper stories about foreign affairs while perhaps excluding 
the reading of diplomatic position papers dealing with similar events. 
The fallacy inherent in this criterion is that some tasks, such as read- 
ing the fine print in a sales contract or a sign saying high voltage, may 
occur so rarely that they would be excluded by this criterion but have 
consequences so critical that they could not be ignored. If this criterion 
is employed at all, it should be done only with caution. 

Criterion of corpus literacy 

Deciding that a person should acquire literacy on some corpus 
of primed language presents us with the familiar question of what 
level of performance we are willing to consider a satisfactory level of 
performance, and with the problem of measuring that performance. 

Instrumentation. It was pointed out earlier that* a criterion 
score can be identified using almost any kind of test as the criterion 
measure and that even the typical standardized achievement and 
aptitude tests, the so-called norm-referenced tests, can be used for 
this purpose. This statement should be qualified somewhat at this 
point. Great care is often taken in constructing these tests to develop 
a pool of test items that actually test a domain of content and to rep- 
Pj^^nt that domain of content adequately. However, the chief function 
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of these tests is to discriminate reliably among the people to be tested. 
Consequently, when the test is composed, the items selected are usu- 
ally selected on the basis of statistical criteria. Miller (in press) has 
conducted a study using data from the Venezuelan National Assess- 
ment of Mathematics skills in which he found that applying these 
criteria seriously reduced the extent to which the test items could be 
said to represent the content of the instruction. Items representing 
some major blocks of content were largely and even wholly eliminated 
from tests whfle the items representing other blocks of content were 
vastly over-represented in tests. The proponents of the current norm 
referenced tests often counter by arguing that their tests, nc^'ertheless, 
yield such high correlations with tests that do represent a domain of 
content that the results are indistinguishable. This claim seems fairly 
likely to be borne out. However, we still cannot completely discount 
the counter claim that, regardless of the size of these correlations, the 
norm referenced test may misrepresent substantial blocks of relevant 
content and, therefore, does not actually represent the content domain 
in the way that a criterion test is normally expected to represent it. 

Scaling a standard criterion test. There are 2 major opera- 
tions involved in this procedure. The objective of the first is to esti- 
mate the distribution of the clo-'.e difficulties of the tasks in the corpus 
of tasks being considered. The object of the second is to scale a stand- 
ard criterion test so that scores on it can be used to estimate the 
proportion of passages on which a person is literate. The first opera- 
tion consists in selecting a fairly large random sample of passages 
from the corpus of reading tasks, making a cloze readability test over 
each, administering these tests to subjects, calculating the difficulty 
of each passage, arraying these difficulties in a distribution, and then 
assigning a percentile score to each passage's score along this distri- 
bution. Each person tested is also required to take a test selected to 
serve as a standard criterion test. 

The second operation involves several steps that result in a 
2-column table. The first column contains the raw scores on the stand- 
ard criterion test and the second column contains a percentile score 
corresponding to each raw score. These percentile scores show the 
proportion of cloze tests on which the average person receiving the 
corresponding raw score was able to score at least as high as the cri- 
terion level of hterate performance. The validity of the operation rests 




leveral unreported studies in which the author has consistently 
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found that, when scores on several cloze and other operationally 
defined tests are regressed on the scores from either another cloze test 
or a standardized test of reading achievement, the slopes of these 
regressions are essentially parallel. Ceiling and floor effects often dis- 
tort the distributions of scores, but logit transformations render the 
slopes parallel. 

The first step of this scaling operation is to regress each set 
of test scores on the standard test, performing a logit transformation 
whenever ceiling or floor effects aire present. The second step consists 
of calculating the equation for each of these regressions. In the third 
step the criterion score selected in the manner described In the pre- 
vious section is substituted in each equation and the equations solved 
to determine, for each passage, the raw score on the standard test that 
corresponds to the criterion score on the test over the passage drawn 
from the population of tasks. The last step Is to take the percentile 
score assigned to that passage and assign It to the raw score calcu- 
lated from the regression equation on that passage. For purposes of 
clarity, the various details of these calculations have been omitted. 

Thus, when this standard test is administered to some new 
subject, we can use it to estimate his corpus literacy level — the per- 
centage of tests on which this person could achieve a criterion level of 
performance if new passages drawn from the same corpus of tasks 
are tested. 

Selecting a corpus criterion score. Although it is undoubtedly 
informative and possibly even useful to know the proportion of pas- 
sages on which a person is literate, we must again raise the familiar 
question; on what proportion of the passages should he be literate? 
Or how good is good enough? And the answer again seems to rest on 
the development of a decision theory that permits us to consider simul- 
taneously the relevant negative and positive benefits associated with 
each level of literacy and to obtain a criterion score that maximizes 
the values we wish to derive from literacy instruction. This will be 
referred to here as the corpus criterion score. 

This procedure should utilize the criterion identification 
model and proceed much as we did in the previous illustration. First, 
we would identify the negative and positive benefits associated with 
being able to read various proportions of the corpus at a desirable 
" These might include estimates of a person's expected income 




and other measures of his occupational success, estimates of the 
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costs associated with raising people to each level of literacy, estimates 
of the degree to which people can and do participate effectively in po- 
litical-civil affairs in cultural activities. These measures would be 
assigned relative values." And finally, the data would be entered into 
the model. The model would thereupon identify, for some hypothetical 
average person, the highest level of performance on that corpus hav- 
ing a positive value. 

The reader is again reminded that the notion of a corpus 
criterion score cannot be fully acceptable unless certain characteris- 
tics of individuals and populations of people are taken into account. 
The treatment given here was intended only to develop the concept of 
corpus criterion scores and to illustrate a rational procedure by which 
they can be identified using whatever test procedure might be suitable 
and available. 

Identification of characteristics 
of individuals in the population 

The fourth parameter of a literacy definition consists of a set 
of characteristics of the people who are the subjects of the literacy 
program. It is almost a cliche to point out that there are individual 
differences among p>eople, differences in their native endowment, en- 
vironmentally acquired assets, and motivations. Hence, there is every 
reason to believe that their instructional needs will differ, not just in 
how much instruction should be administered, but also in which lite- 
racy skills they should learn and how many of those skills should be 
mastered. While the preceding sections have not entirely ignored the 
characteristics of the individual, neither have they examined them 
systematically. The present section will examine why individual char- 
acteristics must be represented in a model, identify the major varia- 
bles, and then present the outline of the model as far as it will be 
developed by this investigation. 

Inclusion of individual characterisrics 

The objective of a criterion model is to help us make deci- 
sions about the instruction of people — decisions that will help people 
to realize their aspirations while simultaneously conserving their re- 



Q 11. Presumably these values would be appropriately adjusted by the usual account- 
p; n I f^H procedures so that they would accurately reflect the tradeoff between immediate 
Cl\lv> id deferred benefits. 
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sources. It is doubtful that a model that omitted individual character- 
istics could reach this objective satisfactorily. To omit them would 
have the effect of forcing us to apply the same criterion to everyone, 
or worse, to allow only a select few to acquire literacy skills. And these 
skills would be the ones that were found to be appropriate for the 
average person. This would force us into an enormous waste of per- 
sonal and social resources and to achieve results that had little corre- 
spondence to the aspirations of individuals and society. Consider the 
obvious case of the severely mentally retarded child who cannot pos- 
sibly aspire to accomplishing much more than bare self-sufficiency in 
the simplest kinds of occupations. Almost any criterion that is appro- 
priate for the average person in a broad segment of the population 
would certainly include many skills that the mentally retarded person 
would have little occasion to use. Moreover, because he learns very 
slowly, his instruction would be very prolonged, expensive to society, 
and expensive to him in terms of the proportion of his life that he 
vvould have to devote to the learning task. Conversely, consider the 
mentally gifted person who can aspire to occupations of great com- 
plexity and of great benefit to himself and society. He could attain 
this criterion rapidly and at little cost to anyone, but his instruction 
would be terminated well before he could realize his aspirations. 
Hence, it should be clear that the cause of neither justice nor efH- 
ciency is served by applying the same criterion to everyone. 

Identification of relevant characteristics 

Three factors seem particularly relevant in identifying a lit- 
eracy criterion for an individual — ^his native capacity to learn, his 
environnrentally acquired capacity to learn, and his motivations. We 
will begin by discussing the need to distinguish between the first 2 of 
these factors and then proceed to discuss each factor separately. 

Distinction between native and acquired capacity. In discus- 
sions of instruction it has been customary to lump together native and 
acquired capacity to learn under the single labels intelligence or apti- « 
tude. In part, this confounds the results because it has been dif- 
ficult to separate the 2 in practical measurement opemtions. In 
tests we use problems of various sorts to measure aptitude for learn- 
and responses to those tasks rest on both native and acquired 




ties. However, it now seems essential to distinguish between the 
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2 concepts both conceptually and in practical measurement opera- 
tions. Our society is now making unmistakable demands that educa- 
tion give greater priority to helping each individual develop his poten- 
tial regardless of his social-environmental background. Thus, educa- 
tors are being asked to distinguish between biologically and environ- 
mentally acquired capacity to learn, to adapt instructional content 
and methods correspondingly, and then to allocate educational 
resources in such a manner that a person's social environmental 
circumstances no longer serve as a major determinant of the edu- 
cational level to which he can aspire. Since the criterion model 
proports to identify for each individual the level of literacy to which 
it is most desirable that he aspire, it follows that this distinction 
should also appear in the model. 

It may be objected that the distinction is futile because we 
currently have no way to assess the 2 concepts separately. This is true, 
but we might be able to solve the problem in at least a modestly satis- 
factory manner. We know many of the social-environmental factors 
that correlate with scores on intelligence and aptitude tests — the edu- 
cational attainment of the parents, parental income, and so on. And 
we have estimates of the degrees to which these factors are them- 
selves heritable. Consequently, we could weight these factors in a per- 
son's environment with a degree of confidence and partial them out 
of his test scores, thereby obtaining separate estimates of the biolog- 
ical and environmental components of his learning capacity." 

Native capacity for teaming. The primary reason for includ- 
ing capacity in the model is that it is a major determinant of what it 
costs the person and society for him to master a given body of con- 
tent. Carroll (1963) has shown ^hat aptitude can be operationalized 
in either of 2 ways: by the amount that a person is likely to learn 
with fixed amount of instruction or by measuring the amount of time 
required for that person to reach a criterion level of performance in 
learning some body of content. From the point of view of this model, 
we are most interested the conceptualization of capacity to learn as 
time to reach a criterion. Time spent in instruction can be translated 
directly into costs. The instruction costs the student an irreplaceable 

12. The reader should be alert to the fact that much controversy presently twirli 
about the genetic heritability of mental abilities, controversy that may make it socially 
unacceptable to represent the distinction between native and acquired ability in a 
^ ' 1. However, if abilities are genetically detennined to any important decree > this 
CD ]f^'* clear that that "fact" can be ignored only at the cost of visiting considerable 
L r\i\> ice upon those individuals who were poorly endowed by their parents. 
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fraction of his life, a fraction of his life that has value to him to the 
extent that he could be using it to produce other things that would 
also be satisfying to him. Similarly, the time spent in instruction is a 
major source of the costs of education to society. Thus, we cannot very 
weU identify a criterion without considering the individual's native 
capacity to learn. 

Acquired capacity. A person's acquired capacity to learn 
affects costs in much the same manner as biological capacity. But 
acquired capacity must be weighted differently in the model, since 
deficits of this kind can be overcome through instruction and sina 
society is willing to allocate a considerable portion of its resources to 
overc Dming them. 

Motivations. People differ in the kinds of occupational choices 
they make and in the kinds of cultural and political activities that they 
engap^ in. Each of these pursuits involve different kinds of reading 
tasks and different amounts of reading. Consequently, if this factor is 
not taken into account, much could be wasted by teaching people 
skills that they neither wanted nor would ever use or by failing to 
teach them skills that they needed. Let us refer to this concept of moti- 
vation as the individual's intentions and distinguish it from another 
sense in which the term is used. 

The term motivation is also used to refer to the extent to 
which a person perseveres in attending to a learning task. This is an 
important consideration in the model since it also helps to determine 
the costs of the instruction. If a person perseveres in attending to the 
instruction, the costs will be lower than if he does not. 

These 2 concepts of motivation should probably be repre- 
sented quite differently in the model. Intentions should probably be 
represented in the boundary conditions for 2 reasons. First, a person's 
choice of pursuits is not a continuous variable. Rather, it is simply a 
person*s choice whether or not to enter each of a number of different 
pursuits that have no obvious dimensional continuity. Second, this va- 
riable undoubtedly interacts with the weights assigned to the effects 
that are associated with various levels of mastery of literacy skills. 
That is, the weight that we would assign to Information Gain, for 
example, depends to some extent on the reason that motivated us to 
read in the first place. On the other hand, a person's perseverance is a 
continuous variable and can be treated much as any other variable. 

ERLC 
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Form of the model 

Thus, when we formalize the model as it now stands, we 
obtain the expression 

V(C:/1J - f(N,. Aw,PwX). 

which IS to say that the value (V) that a person is likely to accrue as 
a result of achieving a given level of performance (i) on a criterion 
test (C) which measures a given set of comprehension skills (s) and 
given that he intends (I) to elect a particular pursuit (j) is some 
weighted (u ) function of each native capacity to learn (N), his 
acquired caj^-^city to learn (A), his ability to persevere on the learning 
task (P), and the appropriate weighted and discounted (d) effects 
(E) associated with this level of performance on the criterion test. 

The weights are the relative values assigned by those people 
affected by where the criterion score is set. This would include a broad 
spectrum of people, including the individual himself or his legitimate 
representative. The effects are discounted in the customary way to 
balance off the advantages of immediate over deferred benefits. The 
effects themselves are variables of the type mentioned in connection 
with establishing task and corpus criterion scores. 

It should be explicitly understood that neither the symbols, 
nor the preceding discussions, prescribed any particular method of 
measuring the factors in the model. I have been deliberately vague on 
these matters because they raise a question that is logically subse- 
quent to the one!^ we are addressing here: namely, what should be the 
general content and form of the model? However, it ^ould be noted 
that the results of this model can never be any more valid than the 
tests and measurements employed to apply it. 

At least 4 criteria are relevant to the evaluation of this model. 
First, the model mu?t be consistent with the values of our society. 
Second, it must take into account all of the major classes of variables 
that are relevant to the problem of identifying a criterion score. Third, 
it should be scientifically feasible to operationalize the model in a 
reasonably satisfactory manner. And fourth, the model should be 
practically feasible to apply. It seems clear to me that the model does 
m**et all of these criteria, at least at a minimal level. However, I %vill 




7e its detailed examination tc the reader. 
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A rejected model 

Before we leave this topic, it seems important to examine 
briefly an alternative model that was explored and then rejected on 
ethical grounds. In recent years we have seen a marked increase in the 
study of the manpower needs or the United States and of the deveU 
oping countries of the world. These investigations have dealt mainly 
with projecting the needs for personnel who are trained in the highly 
speciahzed skills involved in some occupation that is essential to the 
economies of those nations. But a few, particularly those addressed to 
the economic problems of developing nations, have also dealt with 
education in basic areas such as literacy. These studies attempted to 
build models for identifying how many people should be trained. 
These are models that are analogous to the corpus criterion model in 
which we sought to determine what portion of materials people should 
be able to read. 

This type of model is ethically unacceptable for deciding how 
much reading instruction to give an individual. When it is suggested 
that we might like to know what proportion of people in our society 
should be Hterate, we are actually indulging in a euphemistic phrasing 
of the question of what proportion of the people in our society should 
be forced into illiteracy. And this implies that the policy makers have 
the right to decide who should learn to read and who should not. It is 
true, of course, that our educational resources are limited and that we 
may not be able to raise everyone to the level of literacy that we would 
ideally prefer. However, it seems unacceptable to allocate resources in 
this way. Suppose that it is partially true that the ability to read com- 
petently is an essential prerequisite to the exercise of our rights and 
responsibilities as citizens and to the participation in the social, eco- 
nomic, and cultural benefits of our society. And suppose that there is 
also some truth to the proposition that cultural advantage and disad- 
vantage tend to perpetuate themselves. It seems that with this model 
we would be delegating to policy makers the right to create a caste 
system in our society. 

Now, it remains true that our society and every other society 
can make the most of its resources by forecasting its future needs and 
by setting goals for meeting them. And we need information in order 
to do this. But that information is useless if it is cast in a form that is 
^unacceptable to society. In our society we are willing to axept differ- 
^|^(]]ces in the allocation of resources among people and differences in 
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people's eventual levels ol attainment. But we prefer that those differ- 
ences be explicitly determined by personal choice of the individual 
himself and by biological factors. Hence, we must reject this type of 
model. 



A final comment 

In essence, this article has been seeking a useful way to ask 
the question. How well should a person learn to read? We quickly 
rejected the arbitrary criteria previously used and then went on to 
reject models based on simplistic notions such as more is better and 
that perfect mastery is ideal We also rejected partial solutions to this 
problem by recognizing that a person's literacy was jointly determined 
by both his reading abiUty and the readability of the materials that he 
needed to read. Instead, we chose to think in terms of models that 
regarded a person as literate when he could perform well enough to 
obtain the maximum value from the materials he needed to read. 
Consequently, we thereupon set out to examine, on the one hand, 
models that might tell us when a person was literate with respect to a 
single material or a corpus of materials and, on the other hand» 
models that might tell us when a person could read well enough to 
achieve his aspirations. Each of these models could probably be used 
with present techniques and produce modestly believable results, 
although each would be greatly improved if it received the benefit of 
further conceptual analysis and research. However, we must realize 
that these models, no matter how well they may be developed in the 
future, provide only preliminary and partial answers to the central 
question — how well a person should learn to read, given that literacy 
is jointly determined by reading ability and readability. The ultimate 
purpose of investigations of this sort is to help us make maximum use 
of our resources in realizing our goals, and this cannot be fully 
achieved until we have developed a model that permits us to jointly 
identify a criterion of literacy and readability. 
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